Podcast

How Thrive Market's SVP of Engineering thinks about reliability culture

  • https://a-us.storyblok.com/f/1021527/698x698/945982d014/ganesh-datta.png

    Ganesh Datta

    Host

    CTO & Co-founder of Cortex

  • https://a-us.storyblok.com/f/1021527/400x400/1cd1317b72/1516248308657.jpeg

    Randy Shoup

    SVP of Engineering at Thrive Market

February 26, 2026

In This Episode

In this episode of Braintrust, Cortex co-founder and CTO Ganesh Datta sits down with Randy Shoup, SVP of Engineering at Thrive Market. Randy shares lessons from his leadership roles across multiple companies and explains how measurement and transparency can help teams build stronger engineering cultures.

Randy and Ganesh chat about how fear can block progress, why recovery speed matters more than trying to prevent every failure, and how teams improve through steady, incremental gains. They also discuss a few practical ways to build trust around metrics so organizations can use visibility for learning instead of punishment.

You’ll learn

  • Randy says teams are much more likely to care about reliability and delivery when they can clearly see their current state.

  • Randy argues that metrics like deployment frequency, change failure rate, and MTTR should never be used to stack rank individual engineers.

  • Leaders can reduce anxiety by being direct about why they're introducing metrics and by proving over time that the data is used to help teams improve.

  • Randy says that resilience depends on how fast teams recover when failures happen, not on the unrealistic goal of eliminating all failures.

  • Sustained improvement comes from celebrating progress, sharing what works, and raising standards over time.

Quotes

"The sole goal of the team is to mitigate the failure, whatever it is, and restore service as quickly and as fully as possible."

Randy Shoup

SVP of Engineering at Thrive Market

Quote author

"As a service provider, I should prioritize recovering quickly when I do fail, as opposed to trying to prevent all failures."

Randy Shoup

SVP of Engineering at Thrive Market

Quote author

"The unit of production of value is the team."

Randy Shoup

SVP of Engineering at Thrive Market

Quote author

"A lot of times, incidents are unplanned investments."

Randy Shoup

SVP of Engineering at Thrive Market

Quote author

"How do I get people to care about X? Measuring X and being transparent about X. That is the way to do it."

Randy Shoup

SVP of Engineering at Thrive Market

Quote author

Timestamps

  • (02:45)

    Three types of organizational culture and why fear blocks transparency.

  • (08:15)

    Using DORA metrics to build trust and improve delivery at scale.

  • (11:24)

    Why comparing a team to its past self works better than comparing teams to each other.

  • (18:03)

    Treating incidents as unplanned investments and capturing the learning return.

  • (26:28)

    Measurement and transparency as the first step toward a reliability culture.

  • (35:13)

    Making MTTR visible and putting service owners on call.

Other episodes

Podcast

Why great engineering teams don’t accept “normal” errors

Cortex co-founder and CTO Ganesh Datta sits down with Jeff Schnitter, a Solution Architect at Cortex. Jeff shares insights from his time as a Senior Principal Engineer at Workday, where he led developer experience and release engineering, to explore how organizations can successfully shift their internal culture.

The discussion covers the transition from a "Stockholm Syndrome" mindset where teams accept broken processes to a culture of reliability and security. Ganesh and Jeff also dive into the importance of incentives, the role of leadership in empowering teams, and why the most effective transformations start with identifying individual pain points rather than issuing top-down mandates.

January 29, 2026

Placeholder image

Jeff Schnitter

Solution Architect at Cortex

Logotype
Podcast

Why production readiness at Xero starts with the customer, not the checklist

Cortex co-founder and CTO Ganesh Datta sits down with Fred Mare, Principal Engineer at Xero in Melbourne, Australia. They explore what production readiness really means, why it should be framed around customer impact rather than internal processes, and how to build a sustainable program without overwhelming your engineering teams.

Fred also shares how Xero thinks about confidence scores for changes, why production readiness is a continuous journey rather than a one-time gate, and the importance of automating as much as possible to keep engineers focused on what matters most. The conversation also covers how AI coding assistance fits into production readiness, why security can't be separated from operational excellence, and Fred's best advice for engineering leaders just starting to build a production readiness program.

January 15, 2026

Placeholder image

Fred Mare

Principal Engineer at Xero

Logotype

Get started with Cortex