What’s New in the Well-Architected Operational Excellence Pillar?

We recently released an updated version of the Operational Excellence pillar of the AWS Well-Architected Framework, which includes expanded guidance on operating model, and organizational culture, as well as some other refinements.

Gerald Weinberg, in his 1985 book, The Secrets of Consulting, defined The Second Law of Consulting as “No matter how it looks at first, it’s always a people problem.” I appreciate the humor and wit with which he approached software development.

Organizations, teams, and team members have always been a significant part of the Operational Excellence (OE) pillar. They now have their own dedicated section Organization, which includes guidance on Organization Priorities, Operating Model, and Organizational Culture.

How do you structure your organization to support your business outcomes?

Regardless of the choice of methodology—DevOps, ISO, ITIL, or something else—we all ultimately perform the same activities. The single most significant difference is in how we organize responsibilities and teams. Understanding the relationships between individuals and teams is essential to the smooth flow of work through your organization.

Starting from a simplified view of an organization limited to the engineering and operations of both applications and infrastructure (or platform), we explore the tradeoffs between common operating models.

Operational Excellence

Using variations on this 2 by 2 diagram, where responsibilities might span multiple quadrants, we discuss who does what, and the relationships between teams, governance, and decision-making. A well-defined set of responsibilities reduces the number of conflicting and redundant efforts. Business outcomes are easier to achieve when there is strong alignment and relationships between business, development, and operations teams.

Your teams might use different operating models, based on their capabilities and needs. You can be successful with any operating model, but some operating models have advantages, or are better suited, to your individual teams. Mapping who does what across your teams can provide significant insight to how they contribute or impact the flow of work. These insights can lead to the identification of opportunities for improvement as well as opportunities to further leverage the existing capabilities and innovation of your teams.

How does your organizational culture support your business outcomes?

The standards set by your leadership shape your organizational culture, so we begin by looking at Executive Sponsorship. Senior leadership sets expectations for the organization and evaluates success. When senior leadership is the sponsor, advocate, and driver for the adoption of best practices, as well as the evolution of the organization, both are more likely to happen. Our experience has shown that strong executive sponsorship is key to success.

“Diverse opinions are encouraged and sought within and across teams” is our top best practice in Organizational Culture. Leverage cross-organizational diversity to seek multiple unique perspectives. Use these perspectives to increase innovation, challenge your assumptions, and reduce the risk of confirmation bias. Grow inclusion, diversity, and accessibility within your teams to gain beneficial perspectives. These actions call back to the strong alignment and relationships across teams that we discussed with operating models. When you engage diverse perspectives and experiences, the insights you gain are more likely to identify new challenges, new opportunities, and new solutions.

While small in scope, other changes to OE represent significant areas for improvement.

We start out with the subtle addition of “Evaluate governance requirements” alongside the existing “Evaluate compliance requirements.” With compliance, we focus is on the obligations that come from outside your organization, for example the regulatory requirements for your industry. In the case of governance, we are focusing on the requirements that your organization applies to you and your workload.

“Use a process for root cause analysis” has been renamed “Perform post-incident analysis” and moved to “How do you evolve operations?” This change makes it part of continual improvement, where it makes more sense based on the timing of the activity and it gets more emphasis. Regardless of what you call it, if your goal is to identify the contributing factors and mitigate or prevent recurrence, you are doing the right thing.

The final addition to OE is “Perform Knowledge Management.” It’s a difficult challenge that can have simple solutions. Your team members need to be able to discover the information that they are looking for, access it, and identify that it’s current and complete. It’s much better to capture institutional knowledge now, so that the burden carried by key team members can be shared. If a key team member is tragically struck by a winning lottery ticket and forced to become a person of leisure, what will you do? Perhaps right now is the best time to get started on a knowledge management effort.

This update to the Operational Excellence pillar of the AWS Well-Architected Framework gives you and your teams the tools and information you need to improve your operations and governance. Together with the AWS Well-Architected Tool, use them to continue to learn, measure, and improve your cloud workloads.

Special thanks to everyone across the AWS operations community who contributed their diverse perspectives and experiences to improving the Operational Excellence pillar.

Learn more about the new version of Well-Architected and its pillars