QA, AI, and the Return of the Adversarial Mindset

The best QA engineers are always asking themselves (and others around them) what might break. When engineering teams shifted to agile delivery, that mindset largely moved out of dedicated roles and into the background. Automated testing took over the repetitive work, developers owned quality end-to-end, and velocity improved. What didn't carry over was the habit of looking at a feature and asking how a real user, an edge case, or unexpected load might expose it.

Now AI is writing code at a pace no QA org could have matched, even before the cutbacks. Output is up, but so is the surface area for failure. Across the industry, experts are starting to wonder if this trade was the right choice in the first place, and whether AI might be the thing that finally restores what was lost.

Rob Zuber, CTO at CircleCI and a guest on the Braintrust podcast, spent over a decade watching engineering teams build and ship software. Zuber says the best engineers he ever worked with weren't defined by their scripts. "They are not executing a script," he continued. "They have a bag of tools and they know how to execute with that bag of tools to try to get an outcome. When you smell an odd behavior as a human, you're thinking maybe, just maybe, if I put this one weird character in here, it's gonna break over here.”

That instinct couldn’t be replicated in a test suite, but it might be promptable.

What makes great QA engineers so effective

Unlike other engineering disciplines, quality engineers are trained to break things. They ask what happens when, not if, something goes wrong.

The defining quality of the best QA engineers is their ability to think about how things would break. They understand that users don't interact with software the way engineers imagine; they are unpredictable, distracted, and occasionally adversarial. QA teams probe new features the way an attacker might, and in doing so, they uncover novel edge cases that can be addressed before the product hits the market.

How the industry moved away from QA

As teams adopted agile and continuous delivery, the traditional QA model where a separate team reviewed work at the end of a sprint became a genuine bottleneck. Releases queued waiting for sign-off and the amount of time that elapsed between when code was written and when it was tested was so long that everyone lost context. Not surprisingly, bugs found in QA were more expensive to fix than bugs caught earlier in the development process.

The answer was to shift testing left, move quality checks into the development workflow, and automate the repetitive work. We gave developers ownership across every step from writing to production. And it worked! Velocity improved, cycle times shortened, and CI/CD pipelines became the new standard.

What got lost in the transition was harder to quantify. The best QA engineers brought more than test coverage. They had a unique ability to hold a feature in mind and ask what would happen if someone used it wrong, hit an edge case, or ran it under unexpected load. But as dedicated QA roles were deprioritized and talent shifted elsewhere, the capability diffused and largely faded. The creative, adversarial instinct that was cultivated in QA engineering proved difficult to encode in a test suite. Automated testing caught what it was written to catch, but couldn't catch what no one thought to test for.

How AI changes the quality equation

AI-assisted coding has re-sharpened a tension that never fully resolved. When engineers write code, output is bounded by human capacity. When AI assists, that ceiling rises - which is mostly good, until it isn't.

More code, written faster, with limited review, means more surface area for failure. Unit tests generated by LLMs may validate behavior the model inferred, not behavior the system actually needs. Reviews that happened line by line now happen over larger diffs with tighter time constraints. The organizational muscle for catching unanticipated errors is weaker than it once was, and the places an error can lurk have multiplied.

On the other hand, LLMs have absorbed an enormous amount of public code, incident reports, and known failure patterns. As Zuber put it, "An LLM has a massive bag of tools because it's read every line of code that's ever been public. With a little bit of guidance in the right direction, you should be able to say: I built this thing. What's going to go wrong?"

In many cases, LLMs have not been instructed to use their vast knowledge base adversarially. They're prompted to build things, so they build things. Ask an LLM to write a feature and it will. Ask it what will break, give it the right context about the system, use cases, and constraints, and it starts to behave like the best QA engineer that ever existed, at scale. It can surface edge cases no one thought to script. It can bring the adversarial mindset back to a process that lost it.

The tooling is early and the practice of structuring prompts for quality rather than production is not yet standard. But the direction is clear.

What QA looks like in an AI-powered world

Several patterns are emerging for how teams are restructuring quality work around AI.

Spec-first development and LLM-assisted UAT. Rather than writing code and then writing tests, teams are exploring models where the specification is the first artifact — and where LLMs evaluate whether a built system actually meets it. User Acceptance Testing has always been labor-intensive. LLMs change that by making it possible to translate natural-language requirements into structured quality criteria and test against them.

AI QA tooling as a distinct category. A wave of tools is emerging aimed at catching what automated testing misses — using LLMs to review diffs for risky patterns, generate adversarial test cases, or flag edge-case failures that scripted tests don't reach. This category is early but growing quickly.

Code review quality as a meaningful signal. As AI-generated code scales, the quality of the review process becomes crucial. When LLMs produce large volumes of code quickly, engineers risk approving code they don't fully understand. Teams that track code review quality as an explicit metric are better positioned to catch any issues, as opposed to those that assume review is happening well with unit tests that may not reflect actual system behavior.

The direction question. Speed without direction is just waste moving faster. AI tools have significantly lowered the cost of experimentation. That means more ideas tested more quickly, more assumptions validated, and more exploration, but the teams that get this right are the ones also asking whether they're testing the right things in the first place.

What this means for engineering leaders

The engineering leaders who navigate this new normal well avoid two opposite mistakes.

Mistake #1: Treating AI output as inherently trustworthy because it was generated by a system. AI-generated code still needs to meet your organization's standards for quality, security, and compliance. The volume and velocity of AI production don't relax that requirement. Rather, they intensify it.

Mistake #2: Trying to manage AI-driven quality risk through manual processes that can't scale. If the answer to faster output is more manual review, the math doesn't work.

What does work is a governance layer that scales with output: clear standards, automated measurement against those standards, and visibility into where quality is degrading before it becomes an incident. Scorecards that flag missing test coverage or stale runbooks don't require manual checks. They surface the gaps automatically.

The adversarial mindset Zuber describes, asking "what will break?" instead of "what should I build?", is still the right model for quality work. The opportunity now is to encode it into the systems and tooling developers use every day, so it doesn't depend on any one person carrying the load.

How Cortex fits into a modern QA workflow

Cortex provides the governance layer that makes quality standards operational rather than aspirational. As AI tools push code volume higher, the ability to measure every service automatically against your organization's definition of "good" becomes more important, not less.

With Cortex, engineering teams can:

1
Track quality standards at scale with automated Scorecards measuring every service against criteria like test coverage, code review quality, security posture, and documentation, without manual reporting
2
Maintain visibility into service health as the codebase grows, so the services where quality is slipping surface before they cause incidents
3
Enforce standards from the start with service scaffolding that builds quality expectations into new services by default
4
Give leadership a coherent picture of where quality is holding and where it isn't so the conversation is about what's red and why, not whether anyone remembered to check

AI is changing how fast teams can build. The organizations that proactively manage quality as velocity increases are the ones with a system, not just a policy.

Book a demo to see how Cortex helps engineering teams maintain quality standards at scale.

QA, AI, and the return of the adversarial mindset