Amazon Web Services (AWS) offers Amazon Relational Database Service (RDS) and Amazon Aurora as fully managed relational database services. With a few commands, you can have your production database instance up and running on AWS.
An online database frees the database administrator (DBA) from many maintenance and management tasks. However, there are a few significant responsibilities to be aware of. This post discusses the DBA tasks to perform on Amazon RDS for PostgreSQL and Aurora with PostgreSQL-compatible databases.
As a DBA, you face daily pressure to deliver value to your business across many fronts. Maintaining the right platform for running mission-critical databases is becoming increasingly difficult. Maintenance is also a challenging job.
The launch of Amazon RDS and Aurora has vastly reduced the time you spend on tasks like installation, configuration, monitoring, and security. Nevertheless, you must still carry out several critical tasks: several of them daily, a few weekly, and some only at the time of Amazon RDS or Aurora installation (at the time of instance creation).
Some of the administrative tasks that you must carry out include:
- Configuring the parameter group
- Managing IP traffic using a security group
- Auditing the database log files
- Maintenance and management activities
- Planning backup and recovery strategies
- User management
- Monitoring the database
Configuring the parameter group
The data directory of an on-premises PostgreSQL cluster contains the configuration file
postgresql.conf. You can manage the parameters through this configuration file. Similarly, for Amazon RDS and Aurora PostgreSQL instances, you manage the parameters through a parameter group.
Before you create a new Amazon RDS and Aurora instance, customize your DB parameter group. For more information about creating a new parameter group, modifying the parameters, and attaching it to the instance, see Working with DB Parameter Groups.
If you do not have a customized parameter group at the time of creation, you can perform an instance restart. Replace the default DB parameter group with the custom parameter group, which allows the customized parameters to take effect.
The following overview describes which parameters you should turn on for optimal performance:
Enter the following logging parameters:
Enter the following autovacuum parameters:
Enter the following as other parameters:
Managing IP traffic using a security group
In Amazon RDS and Aurora, the security group controls the traffic in and out of the instance. It controls both incoming and outgoing traffic by applying appropriate rules to the security group.
For example, the following screenshot shows how you can allow PG traffic from your applications to the database via port 5432:
Do not open your database to the world by using 0.0.0.0/0.
Auditing the database log files
The busier your database is, the higher the number of transactions. The more transactions, the more logs it generates. The more log files, the more complicated it becomes to extract specific information from those log files.
Most DBAs review their log files as a last resort, but you should turn to them frequently for the
HINTS messages they contain. It is vital to check and audit the log files regularly.
When it becomes difficult to analyze the log files every day due to size, you can use pgBadger, which is available on GitHub. pgBadger is an open-source PostgreSQL log analyzing tool that generates HTML reports from your PostgreSQL log file.
By default, RDS and Aurora instances retain logs for 3–7 days. Run custom bash scripts to download the log files locally or to an Amazon EC2 instance or an Amazon S3 bucket to maintain log files for a longer period.
To install and generate pgBadger reports, complete the following steps:
- Sign in to the AWS Management Console and create one EC2 RHEL or CentOS instance.
- Download the pgdg repo on Amazon EC2.
- To install, enter the following code:
This post tested the preceding steps on RHEL 7 with pgdg-10 (PostgreSQL repo).
To generate the report, complete the following steps:
- Download the PostgreSQL log files from Amazon RDS or Aurora to Amazon EC2 and run pgBadger.
- Enable the logging parameters in your DB parameter group.
- Schedule a cron job to download the log files to an appropriate location on Amazon EC2 and generate the pgBadger report.
- Download and convert your log files with the following code:
This script generates the pgbadger report that you can use to analyze the activities performed on the database. For a sample pgBadger report, see postgres_sample.
Maintenance and management activities
A remote database still requires maintenance. The following section discusses autovacuuming, the
VACUUM ANALYZE command, and long-running queries and sessions.
Query slowness due to table or index bloat is one of the most common scenarios in PostgreSQL. Amazon RDS and Aurora enable autovacuuming by default to reduce this bloat. As you manage slowdown, keep the following in mind:
- Autovacuum holds a less-priority lock on the table. It might cancel its own job when another high-priority operation wants to acquire a lock on the table.
- The same table can become a candidate for repeated autovacuums, which causes other tables to remain bloated.
Because these are the common scenarios in PostgreSQL, you should tune your autovacuum parameter properly. If tuning does not work, you must schedule a manual vacuum/analyze script. Based on the frequency of the bloat, you can decide whether to perform
VACUUM FULL, or
Scheduling VACUUM ANALYZE
To keep the stats updated, remove bloat in reused space, and avoid the transaction wraparound, schedule
VACUUM ANALYZE on your database.
VACUUM removes the bloat and avoids transaction wraparound.
ANALYZE helps to update the database stats, which helps the planner generate good plans for queries.
Before you proceed, you should understand the differences between
VACUUM FULL, and
- VACUUM ANALYZE – Removes the bloat from the tables and indexes and updates the tables’ statistics. This is a non-locking operation; you can run it at a table level or database level. It cleans the bloated pages but does not reclaim the space.
- VACUUM FULL – Writes the entire content of the table into a new disk file and releases the wasted space back to OS. This causes a table-level lock on the table and slow speeds. Avoid using
VACUUM FULLon a high-load system.
- PG_REPACK – Writes the entire content of the table into a new disk file and releases the wasted space back to OS and does it online without holding the lock on the table. It is faster than
VACUUM FULL, and Amazon Aurora and Amazon RDS support it as an extension. Instead of re-indexing or performing a
VACUUM FULL, you should use
PG_REPACKto back up.
PG_REPACKis available as an extension in Amazon Aurora for PostgreSQL and Amazon RDS PostgreSQL.
The following code calculates the bloat and extra space that bloated pages occupy:
You receive the following code as output:
To reclaim the space, run
VACUUM FULL or
After you run
VACUUM FULL, the query returns something similar to the following output:
VACUUM FULL and re-indexing are locking operations that block other sessions, but
PG_REPACK is an online method to reorganize the tables and indexes. You can query the
pg_stat_user_tables to check the last autovacuum or manual vacuum execution.
For example, see the following code:
You receive the following code as output:
You can also use this code:
You receive the following code as output:
VACUUM ANALYZE on a table, enter the following code:
VACUUM ANALYZE on the database, enter the following code:
Only the superuser or database owner can run a vacuum on system tables. If substantial bloat in system tables causes performance degradation, or when you must free up bloated space to the disk, you must run
VACUUM FULL. Only run this command outside of business hours, because it locks the tables on which it runs.
To check the transactional age of the database, enter the following code:
To prevent transaction wraparound issues in the database, enter the following code:
The autovacuum process can also perform these activities, and it is highly recommended that you keep it enabled. Amazon RDS for PostgreSQL has autovacuuming enabled by default.
Make sure that you tune the autovacuum parameters to best suit your requirements. In Amazon RDS, the parameter
rds.adaptive_autovacuum helps automatically tune the autovacuum parameters whenever the database exceeds the transaction ID thresholds.
Enter the following code to check if autovacuum is running in PostgreSQL version 9.6 and above:
Long-running queries and sessions
To terminate queries that have run for a long time or are blocking another session, check the PID of the query from the
pg_stat_activity table. To kill the query, run the following commands.
To cancel the query without disconnecting the connection, enter the following code:
To terminate the connection and cancel all other queries in that connection, enter the following code:
To cancel the running queries, always use
PG_CANCEL_BACKEND. If the query is stuck and locking other processes, you can use
PG_TERMINATE_BACKEND. After termination, you might need to re-run the session again to establish the connection.
Planning backup and recovery strategies
Unlike on-premises databases, which require manual backup and recovery, Aurora for PostgreSQL and RDS PostgreSQL instances have built-in features to automate backups using snapshots. You must enable these during the creation of the Amazon RDS or Aurora instance. Amazon RDS creates a storage volume snapshot to back up the entire database instance.
When you create a DB snapshot, you must identify which DB instance you want to back up, and then give your DB snapshot a name so you can restore from it later. The amount of time it takes to create a snapshot varies depends on the size of your databases. For more information, see Restoring from a DB Snapshot.
User management is one of the most critical admin tasks, and you must perform it with utmost care. When you create a new Amazon RDS PostgreSQL or Aurora for PostgreSQL instance, it creates an
RDS_SUPERUSER role. This is similar to the PostgreSQL user of a typical PostgreSQL instance, but with a few limitations.
You can manage users that connect to the database by setting appropriate permission levels. In a default PostgreSQL environment, you can manage user connection through the
pg_hba.conf file, but in Amazon RDS for PostgreSQL, you must use
GRANT/REVOKE. You can also assign access and privileges to users at a schema level or table level. You can decide on what kind of privileges you want to provide to the users. For more information, see Managing PostgreSQL users and roles.
Monitoring the database
Monitoring is an integral part of maintaining the reliability, availability, and performance of Amazon RDS and your AWS solutions. Collect monitoring data from all the parts of your AWS solution so that you can debug a multi-point failure if one occurs. One of the major tasks is to set up a detailed level of monitoring for your Amazon RDS and Aurora instances.
Amazon Aurora and Amazon RDS offer two types of monitoring by default: Amazon CloudWatch and Amazon RDS Performance Insights.
Monitoring with CloudWatch
CloudWatch offers the following metrics available for Amazon RDS and Aurora PostgreSQL:
- High CPU or RAM consumption
- Disk space consumption
- Network traffic
- Database connections
- IOPS metrics
- Maximum Used Transaction IDs
- Queue Depth
For more information, see Monitoring Amazon Aurora DB Cluster Metrics.
CloudWatch has many metrics available to monitor the health of the Amazon RDS and Aurora instances at the hardware level. However, you must configure Amazon SNS (alarm) on each metric.
Monitoring with Performance Insights
Amazon RDS Performance Insights employs lightweight data collection methods without impacting the performance of your applications to tune the database for performance.
Performance Insights offers the following metrics:
- CPU Utilization – Wait, Idle, Steal, Nice
- Disk I/O – Read KbPS, Write IOsPS
- Load Average
- Swap – Cached, Free, Total
- Cache – blocks hit, buffers allocated
- Checkpoint – Checkpoint timed, buffers checkpoint, checkpoint write latency
For more information, see Performance Insights for Amazon RDS for PostgreSQL.
This post shared a few common administrator responsibilities on Amazon RDS and Aurora for PostgreSQL databases. This provides a basic framework that you can implement on your test and production workloads. The post also highlights logging and log auditing for better management of the instances.
If you have questions or comments about this post, post your thoughts in the comments.
About the Author
John Solomon is a Consultant with AWS Global Competency Center India, working closely with customers who are migrating from on-premises to the AWS Cloud. He is an AWS certified speaker and speaks at various meetups, breakout sessions, webinars, etc. He is an ardent member of the PostgreSQL community and works as a database administrator for PostgreSQL databases.