Podcast

Your Ops Review is Theater, and That's the Point: Aleks Rudzitis on Turning Reliability into a Shared Value

  • https://a-us.storyblok.com/f/1021527/698x698/945982d014/ganesh-datta.png

    Ganesh Datta

    Host

    CTO & Co-founder of Cortex

  • https://a-us.storyblok.com/f/1021527/399x398/4b5818bf88/aleks.jpeg

    Aleks Rudzitis

    Principal Engineer at AWS

July 2, 2026

In This Episode

Aleks Rudzitis is a Principal Engineer at AWS, working in cryptography. He spent years at AWS and Stripe, two shops known for operational rigor, where weekly ops reviews gave him sharp opinions about what makes the practice work and what makes it fail. He writes about reliability on his blog, Bits and Being. The opinions expressed in this podcast are his own.

Aleks joins Cortex CTO Ganesh Datta to make the case that the operational review is a human backstop that matters more as AI accelerates how fast teams ship. They discuss why a standing meeting holds where an asynchronous report tends to atrophy, why the review's authority comes from who's in the room rather than the meeting itself, and how the principle Aleks calls "human-driven infrastructure" persists through the shift to agents. Aleks closes on the concept of ‘creating space’: a good ops review exists to begin the right conversation rather than to review graphs for their own sake.

Read more from him on operational reviews here and on preventing incidents here.

You’ll learn

  • The ops review carries no authority of its own; it is the venue in which authority is exercised. Without that authority present, you've scheduled a status update nobody is obligated to act on.

  • An asynchronous report tends to atrophy in a way a standing meeting does not. A review that becomes an email becomes an email no one reads, and the culture erodes. People absorb the message differently when a leader is in the room asking how the service is performing.

  • Setting the bar at perfection is how you lose the room. The fastest way to alienate a team that's new to reliability work is to imply zero failures is the standard. Error budgets help make the tradeoff legible.

  • Test coverage is the precondition for moving fast with AI. Tests act as a feedback loop that keeps an agent from misjudging how a system actually behaves, so teams with solid coverage can move faster with AI rather than slower.

  • An effective review gives both leaders and engineers a voice. Engineers know which parts of the system are fragile; leaders know what the business priorities are. The review works when it gets those two conversations into the same room.

Quotes

The meeting doesn't have authority on its own. It's a mechanism for that authority to be exercised by the people who do have it.

Aleks Rudzitis

Principal Engineer at AWS

Quote author

You have to break that normalization of deviance from what we've defined as acceptable.

Aleks Rudzitis

Principal Engineer at AWS

Quote author

The definition of correctness hasn't changed. What's changed is your opportunity to do something about it.

Aleks Rudzitis

Principal Engineer at AWS

Quote author

I don't think of myself as a person who writes code to deliver a feature. I think of myself as a person who works with other people to deliver a product that's valuable for even more people. Technology is just a means.

Aleks Rudzitis

Principal Engineer at AWS

Quote author

LLMs don't get tired. They don't get bored. You can now apply formal verification techniques to a lot more software than was possible before.

Aleks Rudzitis

Principal Engineer at AWS

Quote author

Even with AI, software will remain a team sport.

Aleks Rudzitis

Principal Engineer at AWS

Quote author

Timestamps

  • (01:15)

    Aleks's background: AWS, Stripe, and back to AWS.

  • (02:33)

    What an operational review is, and why they exist.

  • (04:32)

    Why a recurring meeting works as a backstop, and where its authority comes from.

  • (06:36)

    The "theater" of running it in public, and the signal it sends to engineers.

  • (08:58)

    The chicken or the egg: Whether ops reviews can build a reliability culture from scratch.

  • (10:52)

    Error budgets, and the trap of setting perfection as the standard.

  • (13:02)

    Why a live meeting beats an async report, and how to keep it from going dull.

  • (19:20)

    What "human-driven infrastructure" means and where the idea came from.

  • (21:59)

    Three ways AI changes the picture: a sharper backstop, more data to digest, pressure to build APIs.

  • (27:00)

    How AI changes operational excellence, and what persists.

  • (29:22)

    Testing is cheap: how AI changes incident prevention.

  • (36:43)

    Where to start with operational reviews.

  • (39:19)

    Creating space: the review as a forum for conversation, not a data readout.

Other episodes

Podcast

Operational Excellence (OpEx) Reviews: The Weekly Meeting That Actually Changes Behavior

In this episode of Braintrust, Cortex co-founder and CTO Ganesh Datta sits down with Shawn Burke, Distinguished Engineer at Cortex. Shawn has led operational excellence efforts at Microsoft, Uber, and SoFi, and brings a practitioner's playbook for how to actually run these weekly reviews, from automating red/green reports, including senior leadership, and engaging in relentless follow-through on action items.

They dig into the mechanics that make operational excellence reviews successful and the reasons most attempts fall short, includinghow to define SLOs that reflect customer experience rather than engineering vanity, why the whole process collapses without automation, and how AI coding assistants are creating new categories of operational risk worth tracking.

June 18, 2026

Placeholder image

Shawn Burke

Distinguished Engineer at Cortex

Logotype
Podcast

Okta's Dinesh Sukhija on Meeting AI with AI, and the Convergence of Platform, SRE, and Security

Dinesh Sukhija is a Director of Engineering at Okta, where he leads SRE, security infrastructure, and enterprise security tooling in the cyber defense org. Before Okta, he ran infrastructure and developer experience at Opendoor, where he was an early Cortex customer and helped build the DevEx practice.

In this episode of Braintrust, Dinesh joins Cortex CTO Ganesh Datta to talk about how platform engineering, SRE, and security are changing under AI: why golden paths still matter, why standardization is now an agent problem as much as a human one, and why the three functions may soon belong under one leader.

June 4, 2026

Placeholder image

Dinesh Sukhija

Director of Engineering at Okta

Logotype
Podcast

Why DevOps Transformations Fail in Regulated Industries, with Merge Ready's Matt Bailey

Matt Bailey is a DevOps consultant and the founder of Merge Ready, a DevOps community and YouTube channel. He spends most of his time working with large regulated organizations across finance, healthcare, and government, helping them untangle the tooling decisions and processes that stall their software delivery.

In this episode of Braintrust, Matt and Cortex CTO Ganesh Datta dig into why buying a new CI/CD platform doesn't count as a DevOps transformation, what "decision latency" costs regulated organizations, and how to automate compliance.

May 21, 2026

Placeholder image

Matt Bailey

Founder & Executive Producer at Merge Ready

Logotype

Start building your AI software factory with Cortex