How to Measure an Engineering Organization in the AI Era: the DRIVE Framework

Engineering leadership is in the middle of a real transition, and most of the leaders I talk to know it. AI has reshaped how software gets built quickly enough that the operating models many of us spent a decade refining no longer fit cleanly, and there is a great deal of serious work happening across the industry to figure out how these models should evolve. The teams I find most impressive right now are the ones treating their operating model as an open question rather than a settled one.

One question keeps surfacing in that work, and the instruments most of us have do not answer it well. We can see that our teams are shipping more than ever, that agents are doing real work, that velocity is climbing. What is much harder to see is whether the organization as a whole is getting better at turning all of that activity into reliable software, or is simply getting faster at producing risk. The conversation in the last few years has been centered solely on measuring the productivity of individual developers and teams and their acceleration with AI, but almost none of them speak to the health of the organization those developers belong to. That gap is the subject of this piece, and I have come to believe it will shape how we lead engineering for years to come.

The work of engineering is moving up a level

For most of the history of software, we built our instruments around the act of writing code, because that was the part we could see and the part that felt like the real work. Many frameworks, like SPACE and DX Core 4, have focused on the effectiveness of the individuals producing that code. DORA was a step forward in focusing on the delivery of software rather than the productivity of individuals. We always knew deep down that writing code was never really the binding constraint (the Theory of Constraints would suggest the limit usually lay somewhere else), but regardless of whether we acted on that knowledge, AI is now taking it out of our hands.

As agents take on more of writing, reviewing, merging, and deploying code, the work that defines an engineering organization moves up a level. Engineers increasingly design and build the systems that produce software, and the organization's hardest problem moves up with that shift, into the operational layer that has to keep those systems healthy and absorb and stand behind everything they generate in production. The software development lifecycle begins to behave like an AI software factory, and the engineer's work shifts from running a single line to designing and improving the factory as a whole.

Our instruments still point at the individual

The difficulty is that almost everything we measure continues to describe the individual developer or their immediate team: how much they ship, how quickly, and how they experience the work. Those were more relevant – though arguably still not enough – when humans wrote every line, but they were never designed to answer the one that matters most under these new conditions, which is whether the organization as a whole can sustainably turn customer needs into reliable software. That is a question of organizational effectiveness rather than developer productivity, measured at a different altitude and against a different unit, and it explains how an organization can make every individual developer measurably faster while growing less able, in aggregate, to keep its promises. The limiting factor no longer sits inside any single person. It sits in the systems that surround them.

Our own research points the same way. In the Cortex 2026 benchmark report, rising pull request throughput was accompanied by rising incident volume, the two climbing in close step. The most straightforward reading is that organizations have begun producing faster than they can safely operate.

We have seen this before, in another industry

What I find steadying, having spent real time in this, is that the underlying problem is not new. It is only new to our discipline.

Manufacturing confronted a version of it a century ago and answered with a body of practice it came to call Operational Excellence: treat the whole factory as a single observable system, measure it as a system rather than as a sum of its parts, and grant everyone working in it the authority to halt the line the moment something goes wrong. The plants that sustained the highest throughput over time were the ones that invested in the instrumentation, the signals, and the regular cadence of review that allowed them to run fast without losing control.

Software's nearest equivalent is the operational review, and I spent a considerable amount of time studying how organizations such as AWS, Stripe, and Google actually conduct theirs. The cultures and formats vary, but the mechanism is strikingly consistent. On a fixed and unmissable cadence, the most senior engineers in the organization sit down together, interrogate the health of the whole, and reallocate time, people, and money according to what they find. These were organizations that already had every gate one could ask for: code review, CI/CD, automated testing, and they held the reviews regardless, because each of those gates operates at the level of an individual change, and none of them can tell you whether the organization itself is sound.

Why I wrote the DRIVE framework

The DRIVE framework is my attempt to make that discipline legible and adoptable. It is a framework for measuring organizational effectiveness across five pillars, paired with a recurring Operational Excellence review that turns measurement into action, and is built to extend the signals teams already track, including DORA's, rather than to displace them. The frameworks already in use describe how productive a team's developers are; DRIVE describes whether the organization around them can deliver.

The practices behind DRIVE already exist; they live in some of the strongest engineering organizations in the world. My work was to study how those organizations operate, distill the pattern they share, and write it down, so that a team of fifty can adopt in a matter of weeks a discipline that a team of five thousand spent years constructing. That compression matters now in a way it did not before, because in the AI era a team of fifty routinely produces the volume that once required a team of five hundred.

We will continue to be judged on the outcome we have always been judged on, which is whether we deliver quality products that honor our commitments to customers at a cost the business can sustain. AI has not altered that obligation; it has raised the pressure on the systems responsible for meeting it. The reflex is to relieve that pressure by slowing down, but slowing down for its own sake tends to bury the bottlenecks rather than clear them. The organizations that pull ahead will be the ones that learn when to slow down and when to push, and build the systems that let them move fast without losing control.

This is the most consequential shift in how we build software in a generation, and the teams that meet it with real discipline will be the ones that shape what comes next. The full framework, the research behind it, and an assessment for locating where your own organization stands are available at cortex.io/drive.

Ganesh

Measuring engineering organizations in the age of AI

The work of engineering is moving up a level

Our instruments still point at the individual

We have seen this before, in another industry

Why I wrote the DRIVE framework

Start building your AI software factory with Cortex