Scheduling and running Amazon RDS jobs with AWS Batch and Amazon CloudWatch rules

Database administrators and developers traditionally schedule scripts to run against databases using the system cron on the host where the database is running. As a managed database service, Amazon Relational Database Service (RDS) does not provide access to the underlying infrastructure, so if you migrate such workloads from on premises, you must move these jobs. This post provides an alternate way to schedule and run jobs centrally.

AWS Batch is a managed service that abstracts the complexities of provisioning, managing, monitoring, and scaling your computing jobs, and enables you to easily and efficiently run jobs on AWS. Additionally, AWS Batch enables you to build jobs using the language of your choice and deploy it as a Docker container.

This post demonstrates how to use the combination of AWS Batch and Amazon CloudWatch rules to dynamically provision resources and schedule and run functions or stored procedures on PostgreSQL database. The same process can be used to run job on any Amazon RDS database.

Overview of solution

The following diagram illustrates the architecture of the solution.

Prerequisites

Before you get started, complete the following prerequisites:

Walkthrough

The following steps provide a high-level overview of the walkthrough:

  1. Clone the project from the AWS code samples repository
  2. Deploy the CloudFormation template to create the required services
  3. Go to the AWS CloudFormation console and make sure that the resources are created
  4. Run database scripts and create the required tables and functions
  5. Build, tag, and push the Docker image to Amazon ECR
  6. Verify if AWS Batch is running the job successfully based on the CloudWatch rule

This post also includes optional instructions to manage changes to the job and schedule with AWS CodeCommit and AWS CodeBuild.

Cloning source code from AWS samples

Download the files required to set up the environment. See the following code:

$ git clone https://github.com/aws-samples/aws-batch-rds-job-scheduling

$ cd aws-batch-rds-job-scheduling

Deploying the CloudFormation template

To run the CloudFormation scripts, complete the following steps:

  1. On the Amazon VPC console, navigate to the Subnets section.
  2. Record the Subnet IDs of the VPC that you will be using.
  3. Record the Security group ID that’s attached to the Subnet and VPC you recorded in the above step, as shown in the screenshot preview below.
  4. Update the comma-separated list of the default Subnets and security groups as input parameters in the batchenv-cf.yaml.

Run the CloudFormation template to provision the required services. See the following code:

$ aws cloudformation create-stack --stack-name batchjob --template-body file://batchenv-cf.yaml --capabilities CAPABILITY_NAMED_IAM --region us-east-1
{
"StackId": "arn:aws:cloudformation:us-east-1:XXXXXXXXXXXXXX:stack/batchjob/73448940-63c5-11ea-918d-1208f0f76cbf"
}

The template creates the following:

  • Docker registry to store the Docker image
  • Job definition to define the Docker image, IAM role, and resource requirements for the job
  • Queue for jobs until they are ready to run in a compute environment
  • Compute environment in which AWS Batch manages the compute resources that jobs use
  • PostgreSQL instance
  • AWS Secrets Manager with PostgreSQL database login credentials
  • CloudWatch rule to run the AWS Batch job based on the schedule
  • Roles with appropriate permission

The following are ancillary services, which are required only if you choose to manage changes to the job and schedule rule using CodeCommit and CodeBuild:

  • Repository to store buildspec.yml and src folder
  • A CodeBuild project to build, tag, and push Docker images to the registry

Services are created with the CloudFormation stack name as a prefix. The following screenshot shows the details of a successful CloudFormation deployment.

Running database scripts

To run the database scripts, complete the following steps:

  1. On the Amazon RDS console, under Database, choose Connectivity & Security.
  2. Record the database endpoint URL and port as shown in the screenshot preview below.
  3. On the Secrets Manager console, under Secrets, choose your secret.
  4. Choose Retrieve secret value.
  5. Make a note of your database credentials.
  6. Connect to the PostgresSQL database that the CloudFormation template provisioned.
  7. Download the SQL script CreateSampleDataAndSP.sql from GitHub. Run this script to create the following objects in your database:
  • DEPT – Table
  • EMP – Table
  • LOW_HIGH_SALARIES – Function

Building, tagging, and pushing the Docker image to Amazon ECR

To build, tag, and push your Docker image to Amazon ECR, complete the following steps:

  1. In your local machine, navigate to the folder with the downloaded source code. See the following code:
    $ cd aws-batch-rds-job-scheduling/src

  2. Open this file and change the following value (if you chose a different stack name and Region when you deployed the CloudFormation template). See the following code:
    secret_name = "batchjob-secret"
    region_name = "us-east-1"

    Note that the Python script (src/runjob.py) that connects and runs the database job is configured to look for the database secret name with the prefix batchjob in US-East-1.

  3. To connect to Amazon ECR, enter the following code:
    $(aws ecr get-login --region us-east-1 --no-include-email)

  4. To package the Python script and all libraries mentioned in requirements.txt as a Docker container, enter the following code:
    $ docker build -t batchjob-ecr .

  5. On the Amazon ECR console, under Repositories, choose your repository link.
  6. To get your environment-specific commands to tag and push, choose View Push Commands.
    The following screenshot shows the locations of your push commands.
  7. Enter the code listed in Step 3 of the preceding screenshot. See the following example code:
    $ docker tag batchjob-ecr:latest 765500136580.dkr.ecr.us-east-1.amazonaws.com/batchjob-ecr:latest

  8. Enter the code listed in Step 4 of the preceding screenshot. See the following example code:
    $ docker push 765500136580.dkr.ecr.us-east-1.amazonaws.com/batchjob-ecr:latest

  9. Verify if you pushed the Docker image from your local machine to Amazon ECR.
     The following screenshot shows the uploaded docker image on the Amazon ECR console.

Running the AWS Batch job manually

To run the AWS Batch job manually, complete the following steps.

  1. On the AWS Batch console, choose Jobs.
  2. Choose Submit job.
  3. For Job name, enter a name for the job.
  4. For Job definition, choose the job definition.
  5. For Job queue, choose the job queue.
  6. Modify the vCPU and memory settings (the default settings are sufficient for this walkthrough).
  7. Choose Submit job.
  8. Under Jobs, choose the job ID link and view the logs to make sure the job completed successfully.
    The following screenshot shows the successful completion of the job.
    The AWS Batch job is configured to run the following function and display the highest and lowest salary of employees in the department provided as input (in the runjob.py script):
    # Call stored procedure
    cur.callproc('low_high_salaries', (10,))
    

    You can validate the jobs using the CloudWatch Logs that AWS Batch produced. The following screenshot shows the job details via the AWS Batch console.
    The following screenshot shows the CloudWatch Logs for the job.

Verifying the scheduled run of AWS Batch jobs

The CloudFormation template also creates a CloudWatch rule to run the job on a schedule. You can change the schedule by editing the rule.

  1. On the CloudWatch console, under Rules, choose your rule.
  2. From the Actions drop-down menu, choose Edit.
  3. Change the schedule based on your requirements.
  4. For the input of the batch job queue, choose Use existing role.
  5. Choose BatchServiceRole.
  6. Choose Configure details.
  7. Choose Update rule.
  8. Monitor the scheduled job on the Dashboard page.
  9. To confirm that the job succeeded, on the Jobs page, choose succeeded.

Cleaning up

On the AWS Management Console, navigate to your CloudFormation stack batchjob and delete it.

Alternatively, enter the following code in AWS CLI:

$ aws cloudformation delete-stack --stack-name batchjob

Managing changes to the job and schedule rule in CodeCommit and CodeBuild

You can use the following steps to manage changes to Docker image instead of building, tagging, and pushing it manually to ECR.

To commit code to the CodeCommit repository, complete the following steps:

  1. On the console, under Developer Tools, choose CodeCommit.
  2. Choose Repositories.
  3. Choose batch-job-codecommit.
  4. Choose Clone URL.
  5. Choose Clone HTTPS.
  6. Clone the batchjob-codecommit See the following code:
    $ git clone https://git-codecommit.us-east-1.amazonaws.com/v1/repos/batchjob-codecommit

  7. Copy the src folder and buildspec.yml you download from the AWS samples into the repository you cloned from CodeCommit.
    The src folder contains the Python code to connect and run functions. The Python script (src/runjob.py) is configured to look for database secrets with the batchjob prefix (for example, secret_name = "batchjob-secret"). If you chose a different prefix, you must change this value. See the following code:
    $ cp -Rf aws-batch-rds-job-scheduling/src ./batchjob-codecommit/.
    $ cp -Rf aws-batch-rds-job-scheduling/buildspec.yml ./batchjob-codecommit/.
    $ cd ./batchjob-codecommit/
    $ git add --all
    $ git commit -m "first commit"
    $ git push origin master

    The following screenshot shows the successful code upload to your repository.
    Click on the Build run to monitor the build.
    The following screenshot shows the build logs.
    Successful completion of the build job pushes a Docker image with Python script and other libraries to Amazon ECR.

Conclusion

This post demonstrated how to integrate different AWS services to schedule and run jobs on a PostgresSQL database. You can run jobs or orchestrate complex job workflows on any RDS database with the same solution by including the compatible python adapter in the docker container and importing it in your python script.

Additionally, this solution helps you manage changes to the job and schedules using the CI/CD toolchain provisioned along with AWS Batch and CloudWatch rules.

 

 


About the Authors

 

Udayasimha Theepireddy (Uday) is a Senior Cloud Architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance on their large scale migrations, helping them improving the value of their solutions when using AWS.

 

 

Vinod Ramasubbu is a Cloud Application Architect with Amazon Web Services. He works with customers to architect, design, and automate business softwares at scale on AWS cloud