How to Run an Operational Excellence Review for Software Engineering

Most engineering organizations already run something they call an operational review. It usually looks like a cousin of the quarterly business review: a deck assembled every few months, walked through team by team, anchored on whatever incidents happened to land in the previous quarter. By the time leadership sees the data, the systems it describes have moved on and the next set of risks is already accumulating in the gap.

The cadence was never really appropriate. A quarterly snapshot of a system that changes continuously is out of date the moment it's taken; the lag was just survivable back when fewer changes landed between reviews. AI-assisted software development didn't introduce that flaw so much as make it impossible to keep ignoring, because the model underneath was always measuring the wrong unit. What follows is a working playbook for what an Operational Excellence review should actually look like, and how to run one that produces decisions rather than decks.

What an operational excellence review is (and what most orgs get wrong)

An Operational Excellence (OpEx) review is a recurring, organization-level evaluation of how well your engineering org turns customer needs into reliable, secure software at a sustainable cost. The idea comes from manufacturing, where operational excellence means treating the whole factory as a single observable system to be measured and improved, rather than a set of stations optimized in isolation.

What software engineering is borrowing from the factory

Once a line could produce a thousand units in the time it used to take to build one, optimizing processes in isolation stopped mattering; what mattered was the throughput and quality of the whole line. The response was to turn factory management into its own discipline, what the West came to call lean manufacturing. The Toyota Production System, Six Sigma, and lean management philosophies treat the line as one system, subject to continuous improvement, and emphasize employee empowerment to push structured problem-solving down to the people closest to the work. The Shingo Institute distilled these practices into the Shingo Model, a set of principles, including the drive to seek perfection through continuous improvement, aimed at one outcome: to create value for the customer with little waste. AI has handed software engineering the same step-change in output, and the same need to manage the whole system rather than the individual contributor.

Many software engineering orgs get the scope wrong. They run the review at the level of one team's quarter instead of the organization's posture. The result is a familiar failure mode: every team passes its own scorecard, and the organization still ships unreliable software. That happens because the unit of measurement is wrong. Individual productivity rolls up to a number that looks healthy while the system around those developers quietly degrades.

This is Goldratt's Theory of Constraints, from his 1984 book The Goal: a useful review identifies the one binding constraint in the system rather than a long, unranked list of findings. Any improvement made anywhere other than the constraint is an illusion. The job of the review is to find where the organization is actually limited, and to point capacity at it.

The five dimensions worth reviewing

A good review needs a structure that makes you think systematically about the whole organization, or it turns into a tour of whatever dashboards happen to be open. The DRIVE framework gives you five dimensions that together describe organizational health, each anchored on a single leadership question and the handful of key performance indicators (KPIs) that answer it. (For the full breakdown of metrics under each pillar, download the full DRIVE framework report.)

Delivery

Are we shipping fast, and is it sustainable? Deploy frequency and lead time tell you how quickly code reaches production. On-call pager volume tells you what that pace is costing the people sustaining it. A high and rising pager load is the andon cord going off, the signal to stop and fix the line before the damage compounds.

Reliability

Are we delivering on the promises we made to customers? Reliability is where engineering output turns into customer satisfaction or churn, so critical metrics should stay grounded in the customer's actual experience: functional SLOs reviewed as pass/fail, and Sev0/Sev1 incident counts. Automated tests can hand you false confidence, so reliability is judged on customer-facing outcomes rather than green dashboards.

Initiatives

Are our org-wide engineering investments making progress? Migrations, security mandates, dependency deprecations, and AI tooling rollouts get tracked haphazardly and deprioritized in favor of feature work. Reviewing Tier 1 initiative completion at the leadership level is what keeps them moving.

Vigilance

Are we actively defending our systems and managing acceptable operational risk?

1
Open critical and high fixable CVEs
2
Assets below the minimum security and compliance bar
3
Orphaned assets with no accountable owner

AI-generated code expands the attack surface faster than ever, which makes this a standing agenda item rather than a quarterly audit.

Efficiency

Are we putting resources against the right problems?

1
Cloud spend against budget
2
Internal AI and LLM token costs
3
Share of capacity going to new work versus maintenance

Time is the largest line item in most engineering budgets, so this is where operational efficiency shows up as real money.

The review has to keep pace with the system

Findings from a quarterly review describe an organization that has already changed by the time they reach the room, so the conversation keeps orienting around problems that have shifted shape or quietly resolved on their own.

The fix is a faster loop. Kaizen, the practice of continuous improvement at the heart of the Toyota Production System (documented in Jeffrey Liker's The Toyota Way), treats improvement as an ongoing operating rhythm rather than a once-a-quarter event. The organizations that run engineering operations well have landed in the same place. AWS has run its Wednesday Ops Review at the SVP level every week for years. Google SRE runs a weekly production meeting it describes as a powerful feedback loop. Stripe rebuilt its review around a weekly cadence with a rotating facilitator after an earlier version lacked impact.

In practice, a continuous review looks like scorecards evaluated on every service, dashboards that reflect current state, and initiative progress tracked against where the org is today rather than where it was at the start of the quarter. Run as a blameless, open forum, it also does quiet cultural work: teams that surface and fix problems in the open tend to show higher employee engagement than teams that absorb the same issues silently.

Six questions your operational excellence review should answer

The review earns its hour only if it produces real answers to questions that change what people do next. These are the six worth walking out with every time:

1
Which teams are shipping slower than their own recent baseline, and why? Compare each team against its trailing average, not against each other.
2
Which customer-facing SLOs are breached right now, and does the incident record agree with them? Green SLOs alongside a real outage means the SLOs are measuring the wrong thing.
3
Which Tier 1 initiatives are blocked, and who is accountable for unblocking them? When an org-wide migration stalls, the constraint is almost always sitting above the team executing it.
4
How many critical and high CVEs are open, and how many assets have no valid owner? Orphaned assets are risk with nobody accountable, which makes them everybody's problem.
5
Where is engineering capacity actually going this period? The split across new work, toil, and compliance is rarely what leadership assumes it is.
6
What did we commit to fix at the last review, and did we? Tracking action-item completion is what keeps the review from quietly becoming a status update.

Who runs the OpEx review, how it runs, and what it produces

Who's in the room

1
The CTO or VP of Engineering owns the review and runs it like a forecast meeting, interrogating anomalies rather than narrating slides.
2
Platform leads handle the data gathering, ideally through automation so the burden never lands on engineering managers. Shawn Burke, who built this practice at Microsoft, Uber, and SoFi, is blunt about it on the Braintrust podcast: the moment someone has to assemble the report by hand each week, the review won't survive.
3
TPMs translate findings into rollouts with owners and dates.
4
Staff engineers bring domain context on the specific systems under discussion.
5
A rotating facilitator role, borrowed from how Stripe fixed its review, keeps someone responsible for asking the hard questions and following up afterward.

The room stays open to anyone in engineering, product, or design. This is shared visibility, not leadership in an ivory tower.

How a single review runs

The arc should be similar every time. Someone (or even better, an automation) pulls the current-state read across the five DRIVE dimensions before the meeting, so you open on data instead of status updates. A good review opens by celebrating operational wins and flagging the upcoming changes, freezes, and migrations that could move the numbers next week. Then the group walks each dimension and focuses on the anomalies and checks what the last review committed to and whether that happened. Every gap that matters leaves the room as a named initiative with an owner and a date, and the whole picture gets written up for leadership in the terms the business already tracks. The exact blocks and timing shift with org size; the DRIVE framework lays out full agendas for startup, local-team, and org-wide reviews.

What you walk out with

A good review produces a small set of concrete artifacts: a current-state read across the five dimensions, an updated initiative backlog with owners and deadlines, a refreshed set of scorecards, a current risk register, and a leadership readout that ties what the org found back to the outcomes the business cares about. If the only thing a review produces is a deck, it has failed at its actual job, which is to change where the organization spends its next unit of effort.

Operationalizing the findings

The hard part of any OpEx review is the shift from measuring something to moving it. A finding that doesn't become a named initiative with an owner and a deadline is a finding the organization has already decided to ignore.

Tie those initiatives back to scorecards so the improvement is enforced in developer workflows through automation rather than left aspirational on a wiki. Cortex Scorecards let you set the benchmarks for what production readiness and security compliance mean in your org, then measure every service against them; Initiatives turn the gaps into tracked, owned work. That is how you assure quality at the source, the lean principle of building quality in where the work happens rather than catching defects in a review downstream. Then connect the engineering picture to what leadership is measured on: revenue protection, the cost of incidents, customer satisfaction, time to market.

Tala did this. After standardizing on a continuous model of standards and ownership, the company increased the number of services meeting its maturity bar by 8x, and cut deploy time from weeks to minutes.

Running a modern operational excellence review

Think of the OpEx review as backpressure. AI applies enormous forward pressure on an engineering org: more code, shipped faster, bearing down on every downstream check. The review is the counter-pressure that keeps the system inside a safe operating range instead of letting risk compound out of sight. Metrics become useful only when a recurring review forces the organization to interrogate them and reallocate against what they reveal.

This is what engineering operations looks like as a discipline: running the engineering organization as an observable system you can measure and improve on a steady cadence. Run consistently, the review becomes part of engineering's rhythm of business, the same way reliability reviews already are for the strongest engineering teams and forecast reviews are for the best sales orgs. As AI agents join the system as its newest contributors, multiplying output and the rate of change, that backpressure matters even more.

A factory line that never stops for defects carries a lower sustained throughput, not a higher one, and an engineering org running an AI software factory is no different. Operational rigor is not the brake on speed; it is what lets you go a little faster each lap without driving into the wall. The orgs that build this discipline will not spend the AI era reacting to last quarter's incidents. They will spend it pulling ahead.

See how your organization measures up across the five DRIVE dimensions and get specific recommendations in about five minutes. Take the DRIVE assessment.

FAQ

How often should you run an operational excellence review?

Weekly or biweekly. By the time a quarterly review's findings circulate, the organization they describe has already moved on. AWS runs its ops review weekly at the SVP level; if a company that size can hold the cadence, smaller orgs can too. The review should be predictable and effectively unmissable; if schedules get too busy for it, that is usually a sign it matters more, not less.

Who owns the operational excellence review?

The CTO or VP of Engineering should own it. Many orgs add a rotating facilitator whose job is to interrogate anomalies and follow up on action items so the review keeps producing change.

What metrics belong in an operational excellence review?

Start with the critical metrics across the five DRIVE dimensions: deploy frequency, lead time, and on-call volume for Delivery; functional SLO status and Sev0/Sev1 incidents for Reliability; Tier 1 initiative completion for Initiatives; open critical CVEs and orphaned assets for Vigilance; and spend against budget plus capacity allocation for Efficiency. Use the metrics you already have, then expand. The cadence and the conversation matter more than collecting every possible number.

Does an operational excellence review only measure how fast we write code?

No. Writing code was never the binding constraint, and AI coding tools have made that plain. The review looks at the whole value stream, the full software development lifecycle from idea to production and operation, rather than the authoring step alone. We can pull inspiration from manufacturing here as well; in practice that means watching cycle times stage by stage to find where process efficiency drops, and using process mapping to see where work sits waiting. Lean frames the goal as a pair of principles, improve flow and pull: keep work moving through the system, and let real demand rather than a quarterly plan decide what gets pulled in next.

What's the difference between an operational excellence review and a postmortem?

A postmortem analyzes a single incident after it happens, focused on why the system permitted that specific failure. An operational excellence review is recurring and forward-looking, covering the whole organization's posture across all five dimensions. Incident readouts feed into the review, but the review is broader: it exists to reallocate time, people, and money before the next failure, not only to learn from the last one.

How to run an operational excellence review for software engineering