GCP — Periodic export of MySQL backups to a bucket with Terraform

Using Terraform to create GCP MySQL instances is simple and easy to do – ten lines of code will create a vanilla MySQL instance:

# Create a database instance
resource "google_sql_database_instance" "default" {
count = "${var.create_database}"
project = "my-gcp-project"
name = "test"
database_version = "MYSQL_5_7"
region = "australia-southeast1"
settings {
tier = "db-f1-micro"
}
}

That is the point of using Terraform. This also means it’s possible to easily destroy your MySQL instance. However – being Terraform; it’s easy to recreate the MySQL instance and then to get back to target state by simply restoring the latest database backup to the instance.

Unfortunately we now cut to the scene where somebody says “I’ve got a bad feeling about this”.

Deleting a GCP MySQL instance also deletes the associated backups — whether they are automated or manual. Therefore if the instance gets destroyed, so do the backups.

The obvious mitigation is to periodically export the backups to a bucket to provide a safeguarded database backup. There are a few guides on how to do this clickops-style but clickops is bad.

Image by Author

Let’s get an automated GCP native periodic database export going using Terraform.

Below is a walkthrough of the code with explanations. Complete code can be downloaded from this Github repository.

GCP provides a REST API endpoint that can be called to export data from a Cloud SQL instance to a Cloud Storage bucket as a SQL dump or CSV file. In order to set up an automated periodic export, the following pieces are required:

  • A destination bucket
  • A Cloud Function that invokes the REST API
  • A Cloud Scheduler that periodically calls the Cloud Function

We’ll set it up so that all the above elements can be conditionally created with a switch named create_export_function.

Here is the easy part — creating a bucket with simple lifecycle rules:

  • anything older than 7 days goes to NEARLINE storage
  • anything older than 30 days goes to COLDLINE storage

We’re also going to append a random suffix to ensure the bucket name is unique

resource "random_id" "db_bucket_suffix" {
count = "${var.create_export_function}"
byte_length = 2
keepers = {
project_id = "${var.project_id}"
}
}
resource "google_storage_bucket" "db_backup_bucket" {
count = "${var.create_export_function}"
name = "${var.db_name}-db-backup-${random_id.db_bucket_suffix.hex}"
project = "${var.project_id}"
location = "australia-southeast1"
versioning = {
enabled = "false"
}
storage_class = "REGIONAL" lifecycle_rule {
action {
type = "SetStorageClass"
storage_class = "NEARLINE"
}
condition {
age = 7
}
}
lifecycle_rule {
action {
type = "SetStorageClass"
storage_class = "COLDLINE"
}
condition {
age = 30
}
}
}

Since the REST API is going to be invoked to do the heavy lifting, the MySQL instance will need to have access to the bucket to be able to write the resulting backup file.

resource "google_storage_bucket_iam_member" "db_service_account-roles_storage-objectAdmin" {
count = "${var.create_export_function}"
bucket = "${google_storage_bucket.db_backup_bucket.name}"
role = "roles/storage.objectAdmin"
member = "serviceAccount:${google_sql_database_instance.default.service_account_email_address}"
}

Now on to the more complicated pieces and getting Terraform and GCP Cloud Functions to play nicely together.

We’ll use a simple NodeJS script that will be passed the MySQL instance name and target bucket to use when invoking the database export REST API call.

To assemble this there are a few pieces that need to be aligned to work together.

  1. Terraform can create GCP Cloud Functions

The Terraform resource google_cloudfunctions_function will create a Cloud Function in GCP.

2. GCP Cloud Functions use source from a zip in a bucket

Cloud Function can be created from a Cloud Source repository or zipped source code in a bucket. We’ll use a zip that will be created under /code/ in the bucket we created earlier.

3. Terraform can create and update the zip

On the occasion the NodeJS code needs to be updated, we want Terraform to recreate the zip to push the updated code to the Cloud Function. This can be implemented using the archive_file resource in Terraform.

4. GCP only updates the Cloud Function if the zip has a different name

Overwriting the zip will not trigger the Cloud Function to update itself. It only will update when the filename of the zip changes.

The simple solution is to append the MD5 of the source javascript file to the zip name in the bucket. If the source changes, the zip file name changes and the Cloud Function is triggered to update.

Note: Appending the MD5 of the zip to the local zip created by Terraform makes for noisy Terraform plan/apply invocations, as the archive_file resource must be recreated each time which Terraform lists in the plan even though there are no actual changes. In a pipeline this creates Terraform output noise and increases the probability that smaller changes will get missed during reviews.

Use the archive_file resource in Terraform to zip the local source code for upload

# create local zip of code
data "archive_file" "function_dist" {
count = "${var.create_export_function}"
output_path = "./dist/export_function_source.zip"
source_dir = "./app/"
type = "zip"
}

Determine the MD5 of the main source file and append that to the uploaded zip file name

# upload the file_md5 to GCP bucket
resource "google_storage_bucket_object" "cloudfunction_source_code" {
count = "${var.create_export_function}"
depends_on = ["data.archive_file.function_dist"]
name = "code/export_database-${lower(replace(base64encode(md5(file("./app/export_database.js"))), "=", ""))}.zip"
bucket = "${google_storage_bucket.db_backup_bucket.name}"
source = "./dist/export_function_source.zip"
}

Using the zip file as the source, create a Cloud Function that will invoke the REST API to export the database to the bucket

# create function using the file_md5 as the source
resource "google_cloudfunctions_function" "export_database_to_bucket" {
count = "${var.create_export_function}"
depends_on = ["google_storage_bucket_object.cloudfunction_source_code"]
project = "${var.project_id}"
region = "asia-northeast1"
name = "export_database_to_bucket"
description = "[Managed by Terraform] This function exports the main database instance to the backup bucket"
available_memory_mb = 128
source_archive_bucket = "${google_storage_bucket.db_backup_bucket.name}"
source_archive_object = "code/export_database-${lower(replace(base64encode(md5(file("./app/export_database.js"))), "=", ""))}.zip"
runtime = "nodejs8"
entry_point = "exportDatabase"
trigger_http = "true"
}

The final piece is to create a Cloud Scheduler as a cron job to invoke the Cloud Function created above with the right parameters to export the database to the bucket.

There are three elements:

  • google_compute_default_service_account — used to provide authentication to invoke the Cloud Function
  • template_file — provides parameters needed by the Cloud Function
  • google_cloud_scheduler_job — the actual job in Cloud Scheduler that will invoke the Cloud Function
data "google_compute_default_service_account" "default" {
project = "${var.project_id}"
}
data "template_file" "cloudfunction_params" {
count = "${var.create_export_function}"
template = <<EOF
{
"project_name": "${var.project_id}",
"mysql_instance_name": "${google_sql_database_instance.default.name}",
"bucket_name": "${google_storage_bucket.db_backup_bucket.name}"
}
EOF
}
resource "google_cloud_scheduler_job" "db_export_trigger" {
provider = "google-beta"
count = "${var.create_export_function}"
depends_on = ["google_storage_bucket_object.cloudfunction_source_code"]
project = "${var.project_id}"
name = "db-export-scheduler-job"
schedule = "0 8,18 * * *"
description = "Exports the database at 8am and 6pm"
time_zone = "Australia/Melbourne"
http_target = {
uri = "${google_cloudfunctions_function.export_database_to_bucket.*.https_trigger_url[0]}"
http_method = "POST"
body = "${base64encode(data.template_file.cloudfunction_params.rendered)}"oidc_token = {
service_account_email = "${data.google_compute_default_service_account.default.email}"
}
}
}

And voila — we now have periodic exports of the MySQL database backups to a bucket using minimal native GCP elements.