There’s a particular flavor of regression that’s become depressingly common over the past two years. The code compiles, the tests pass, CI is green, and the PR even got a couple of LGTMs. And then, a week or two after deploy, something subtly wrong starts happening, often a serializer that doesn’t match its model and quietly drops fields from API responses.

Welcome to the AI slop era. The class of bug we’re shipping has changed, but the tools we use to catch them haven’t. Scrubby is a direct response to this. It’s a way to make sure that the code being added to your codebase actually fits the codebase, before it ever reaches production.

What “stable release” means in 2026

It used to be that release stability was mostly about logic correctness. Did this code do what it was supposed to do? Did it handle edge cases? Did it pass tests? Static analysis caught some classes of issue, linters caught syntax, and tests caught regressions in known behavior.

That model is incomplete now. The new failure modes look like this. An AI agent generates a service that looks like the rest of the codebase but uses a slightly different error-handling pattern. Tests pass. In production, certain error paths are now silently swallowed because the surrounding infrastructure was written assuming the canonical pattern.

That kind of failure wouldn’t be caught by any conventional CI tool, because it’s not a logic error in the traditional sense. It’s a fit error. The code is technically correct, but it doesn’t match the patterns the rest of the codebase relies on.

Research from CodeRabbit found that AI-authored code has 75% more logic errors and 2.74× more security issues than human-authored equivalents. iSync’s analysis of the AI code slop crisis put the OWASP Top 10 vulnerability rate in AI-generated code at 45%. These numbers don’t show up as red CI builds. They show up as production incidents weeks later.

Why generic tools can’t catch this

Linters enforce universal rules, which are the rules of the language. Static analysis enforces the rules of correctness, and generic AI reviewers enforce general best practices.

None of those layers know your codebase’s actual conventions. They can’t tell you that this particular controller is missing the Authenticatable concern that every other controller in this domain includes, because they have no model of what changes with what in your repository.

This is the gap codebase intelligence fills.

How Scrubby catches what other tools miss

Scrubby builds a structured understanding of your repository (covering domains, connections, conventions, and co-change patterns) and applies it to every changeset before merge. The findings break down into four categories that map directly onto the failure modes above.

Convention violations. When code is added that doesn’t match the patterns of the domain it’s in, Scrubby flags it. These are the specific patterns your team has actually been using, derived from your real history. “Controllers in the billing domain consistently include BillingPolicies (47 of 49 controllers do). This new controller is missing it.”

Domain boundary crossings. When a change in one domain reaches into another in a way the codebase doesn’t normally do, Scrubby surfaces it. Sometimes the crossing is intentional and the right call. Other times it’s an AI agent that didn’t realize there’s a facade pattern in place to keep notifications from depending on billing directly. Either way, it gets a second look before it ships.

Missing co-changes. When a file is modified, Scrubby checks the historical record to find what other files have historically changed alongside this one. If the changeset is missing them, that’s a strong signal something’s incomplete. “This model has been changed 23 times in the last 18 months, and 21 of those changes also touched its serializer. The serializer is not in this changeset.”

Wrong-place additions. When a new file is added in a location that doesn’t match the conventions of similar files, Scrubby flags it. “New services in this codebase are consistently added to app/services/<domain>/. This new service was added at the top level.”

These four categories are exactly the failure modes that have become more common as AI agents generate more code. They’re also exactly the failure modes most likely to slip through code review, because they require holistic knowledge of the codebase that no individual reviewer has fully in their head.

The “ship faster, ship safer” tension dissolves

For most of the history of software engineering, ship faster and ship safer have been in tension. You could speed up by cutting corners on review, or you could lock things down with stricter gates and accept slower throughput. Most teams chose a point on the curve that felt right.

AI tooling has scrambled that curve. Code is being generated faster than ever, but if the code being generated isn’t trustworthy, ship faster and ship safer are now actively opposed. More code means more risk, more incidents, more rework, and slower net progress.

Codebase intelligence is what lets you have both again. The AI agents writing the code have access to the conventions and architecture they need to produce code that fits. The review layer catches anything they miss before merge. The result is more code being shipped, with a lower defect rate than the team had before AI tooling was in the picture.

This isn’t theoretical. It’s the direct, mechanical consequence of giving AI tools the codebase context they were missing, plus a review layer specifically designed to catch the failure modes AI tools are most prone to.

What ships with Scrubby in the loop

A few patterns we see consistently:

  • Convention drift slows down. The slow accumulation of slightly-off code that compounds into tech debt mostly stops, because the slightly-off code never makes it past review.
  • Co-change misses go to near-zero. The “model updated, serializer wasn’t” class of bug effectively disappears.
  • Hot domains get visibility. Scrubby’s tracking of change velocity means you can see when a domain that should be stable is getting touched constantly, which is usually a sign that something underneath is wrong.
  • Production incidents shift in character. The incidents you do have are more often genuine logic bugs (the hard kind, where smart people disagreed about the right approach) and less often “structural” bugs (the easy-in-retrospect kind that should have been caught in review).

The first three are upstream. The fourth is what your release manager actually cares about: a production incident profile that’s smaller and more concentrated on the kinds of issues no automated tool was ever going to catch anyway.

How to roll it out for release stability specifically

If your goal is concrete improvement to release stability, the practical sequence is:

  1. Connect the GitHub App to your highest-velocity repo first. This is where convention drift is happening fastest, and where Scrubby’s findings will be most immediately useful.
  2. Pay attention to the first 50 PRs Scrubby reviews. Some findings will be obvious wins. Others will be surprising patterns your team has been enforcing implicitly that you didn’t realize were that consistent. Both are valuable.
  3. Wire up the MCP server for your AI agents. This shifts the prevention upstream. Code that would have been caught by the GitHub App now isn’t generated in the first place.
  4. Track defect rate over the next few release cycles. The signal won’t be subtle. Convention violations and missing co-changes are a meaningful fraction of post-deploy incidents in most codebases.

The broader picture

Stable releases aren’t really about catching every possible bug before deploy. They’re about reducing the expected cost of every change you ship. That cost is some combination of the probability the change goes wrong, the severity of what happens when it does, and the time it takes to recover.

Codebase intelligence reduces all of that at once. The probability drops because AI-generated code fits the codebase from the start. The severity drops because the failure modes that slip through are smaller and more contained. The recovery time drops because the architecture is now visible and queryable, so when something does go wrong, the blast radius is obvious.

If you’ve been feeling like your release stability has gotten a little wobblier as your team’s AI usage has grown (which most teams have), Scrubby is one of the more direct interventions you can make. It’s the layer the rest of your tooling has been missing.

Sources: