Introducing the Well-Architected Framework for Machine Learning

We have published a new whitepaper, Machine Learning Lens, to help you design your machine learning (ML) workloads following cloud best practices. This whitepaper gives you an overview of the iterative phases of ML and introduces you to the ML and artificial intelligence (AI) services available on AWS using scenarios and reference architectures.

How often are you asking yourself “Am I doing this right?” when building and running applications in the cloud? You are not alone. To help you answer this question, we released the AWS Well-Architected Framework in 2015. The Framework is a formal approach for comparing your workload against AWS best practices and getting guidance on how to improve it.

We added “lenses” in 2017 that extended the Framework beyond its general perspective to provide guidance for specific technology domains, such as the Serverless Applications Lens, High Performance Computing (HPC) Lens, and IoT (Internet of Things) Lens. The new Machine Learning Lens further extends that guidance to include best practices specific to machine learning workloads.

A typical question we often hear is: “Should I use both the Lens whitepaper and the AWS Well-Architected Framework whitepaper when reviewing a workload?” Yes! We purposely exclude topics covered by the Framework in the Lens whitepapers. To fully evaluate your workload, use the Framework along with any applicable Lens whitepapers.

Applying the Machine Learning Lens

In the Machine Learning Lens, we focus on how to design, deploy, and architect your machine learning workloads in the AWS Cloud. The whitepaper starts by describing the general design principles for ML workloads. We then discuss the design principles for each of the five pillars of the Framework—operational excellence, security, reliability, performance efficiency, and cost optimization—as they relate to ML workloads.

Although the Lens is written to provide guidance across all pillars, each pillar is designed to be consumable by itself. This design results in some intended redundancy across the pillars. The following figure shows the structure of the whitepaper.

ML Lens-Well Architected (1)

The primary components of the AWS Well-Architected Framework are Pillars, Design Principles, Questions, and Best Practices. These components are illustrated in the following figure and are outlined in the AWS Well-Architected Framework whitepaper.

WA-Failure Management (2)

The Machine Learning Lens follows this pattern, with Design Principles, Questions, and Best Practices tailored for machine learning workloads. Each pillar has a set of questions, mapped to the design principles, which drives best practices for ML workloads.

ML Lens-Well Architected (2)

To review your ML workloads, start by answering the questions in each pillar. Identify opportunities for improvement as well as critical items for remediation. Then make a prioritized plan to address the improvement opportunities and remediation items.

We recommend that you regularly evaluate your workloads. Use the Lens throughout the lifecycle of your ML workload—not just at the end when you are about to go into production. When used during the design and implementation phase, the Machine Learning Lens can help you identify potential issues early, so that you have more time to address them.

Available now

The Machine Learning Lens is available now. Start using it today to review your existing workloads or to guide you in building new ones. Use the Lens to ensure that your ML workloads are architected with operational excellence, security, reliability, performance efficiency, and cost optimization in mind.