AI changed how code gets written before it changed how code gets operated. Generation accelerated; the downstream controls that turn that output into reliable, secure software at a reasonable cost did not keep pace. The result is elevated risk, distributed unevenly across engineering organizations. A recent survey explains why the distribution is so uneven.
Gergely Orosz and Elin Nilsson recently published the part 2 results of a survey of more than 900 engineers on how AI tooling is changing their work. Its central finding is the argument platform teams have been making to their leadership since AI coding tools arrived: AI amplifies the engineering practices an organization already has rather than compensating for the ones it lacks. Where standards, ownership, and tooling were mature, AI accelerated delivery. Where they were not, the same tools accelerated technical debt and instability.
This is the first large, independent dataset to support that claim, and it isolates the variable that matters. The outcome turned on the operational maturity of the standards, ownership, and tooling each organization had in place when AI arrived. None of those three is a fixed trait. Each can be measured and improved deliberately, and the rest of this piece focuses on how.
Standardize and measure production readiness
Most engineers in the survey reported a decline in codebase quality after adopting AI, even as their dashboards continued to reward throughput. We found similar results in Cortex's 2026 Benchmark report: pull requests per author up 20% year over year, incidents per pull request up 23.5%, and change failure rates up roughly 30%. The teams that didn't degrade had one thing in common: testing, documentation, and review were already measured standards before AI entered the workflow.
The control that closes this gap is a measured definition of production readiness, applied continuously rather than aspired to:
Define what production-ready means for your organization as five to seven checks (test coverage thresholds, runbook presence, dependency freshness, security baselines).
Score every service against them on a fixed cadence, and surface the result in the same dashboard leadership already uses to track velocity.
Run it weekly, so a downward trend is visible to the organization before an incident makes it visible to customers.
Placing quality and speed in the same review changes the incentive structure. The trade-off between them stops being invisible, which removes the option of reporting speed while quietly redefining "shipped" as "the PR merged." It also means every service AI generates is measured against the same bar the moment it enters the catalog, rather than passing as complete and surfacing as a problem later.
One caution, and it is the one organizations get wrong most often: scoring everything turns the scorecard into process that engineers route around. The teams it works for measure the small set of checks that genuinely predict an incident and leave the rest alone. Tala took that approach. Its platform team rebooted the SDLC around a data-first definition of service health and held every team to a single maturity bar. Adoption proved smoother than the team expected, and the results followed: an 8x increase in services meeting required maturity standards, 98% with self-service documentation, security testing on every deploy, and deploy time reduced from weeks to minutes. Tala's VP of Platform Engineering and VP of Program Management walk through how the rollout actually ran in Production Readiness in Practice.
Expand the catalog’s remit to cover AI
Survey respondents identified tooling consistency at the team level as one of their most significant frustrations. Teams adopt different assistants, write different prompts, and converge on different workflows, and the knowledge built inside one team stops at its boundary. Onboarding slows rather than accelerates, because there is no shared, sanctioned path to hand a new engineer. The same fragmentation is already moving beyond the codebase: the Wall Street Journal recently reported that Lyft, DaVita, and GitLab are managing "AI agent sprawl," with employees creating agents faster than IT can inventory them.
Tool sprawl and workflow sprawl are the same problem at different levels of the organization, and both resolve to a single requirement: a system of record. The catalog's remit has to expand to cover AI as it already covers services. Approved tools, sanctioned workflows, and reusable prompts and specifications become first-class catalog entries, each with an owner, a version, and a review history. Tool sprawl is the obvious half of the problem. Workflow sprawl is the half that stays hidden: the same task implemented five ways across five teams, each with its own prompt library and its own undocumented standard for what "reviewed" means.
Once the sanctioned path is catalogued and demonstrably easier than building from scratch, adoption follows on its own, and each new engineer begins from an established pattern rather than reconstructing one.
Record and review ownership coverage
As AI generates more code, responsibility for understanding and maintaining it concentrates on a shrinking group of engineers who still hold the system in their heads. The survey frames this directly: ownership is eroding and the bus factor is worsening. The volume of code rises while the number of people accountable for any given part of it falls.
Expanding what the catalog covers only pays off if the ownership already recorded in it is accurate, which is a separate undertaking. Ownership has to move from tribal memory to recorded fact: when AI ships a change, the catalog should capture who reviewed it, who owns its consequences, and which standards it satisfied. Coverage establishes that a service exists. Accountability establishes that someone can answer for it. The second is the property that survives the volume now arriving.
Review ownership coverage across the catalog and confirm, for every production service, that a named owner can speak to it today. If answering "who owns this?" takes longer than ten seconds, the organization is not prepared for the throughput AI is about to introduce. Most catalogs were sized for an era in which human capacity was the constraint on new service creation, on the order of 50 services per quarter. AI removes that constraint. Services that fail the review are reassigned, documented, or retired before the next tooling rollout, not after an AI-generated migration script takes down production in the middle of the night with no clear owner to call.
The foundation is the part you control
The amplifier holds in both directions. Rather than compensating for weak standards, fragmented tooling, or unclear ownership, AI widens the distance between organizations that have installed those controls and organizations that have not, and it does so faster than any prior shift in how software is built.
Codebase quality, tooling consistency, and ownership clarity were always the foundations of a heathy engineering organization. AI has raised the cost of operating without them. Organizations remain accountable for the same outcomes they always were, delivering reliable, secure software at a reasonable cost, regardless of how much of the code a model wrote.
Nathen Harvey, who leads the DORA research program at Google Cloud, is giving a talk on this dynamic at EVOLVE, our conference in New York City on September 24. It's called "The amplifier effect: Elevating developer experience in the AI era." Harvey will walk through the AI adoption "J-curve," why early dips in flow and stability are expected, and how quality internal platforms and healthy data ecosystems decide which side of the amplifier a team ends up on.


