Lowering Token Consumption with Codebase Intelligence

If you’ve watched a modern AI coding agent work through a non-trivial task, you’ve seen this pattern. It reads a file. Then another file. Then it greps for something. Then it reads three more files to understand the relationships. Then, finally, it writes a few lines of code.

That dance, which is the agent’s groping search for context, is where most of your token bill is going. Recent analysis suggests AI coding agents waste up to 80% of their tokens just finding things, instead of solving the problem you actually asked them to solve. And as conversations get longer, it gets worse: the entire history is re-read on every turn, because the model has no persistent memory of what it learned five minutes ago.

This is the problem Scrubby was, in part, designed to solve. Codebase intelligence dramatically reduces the amount of exploratory reading an agent has to do, and the savings show up directly in your token usage.

The shape of the problem

A modern coding agent in a non-trivial repo spends its tokens roughly like this:

System prompt and tool definitions. Fixed cost, but not small.
Conversation history. Grows with every turn and re-paid every turn.
Tool calls and their results. Every read_file, every grep, every ls adds to the pile of tokens in context.
Actual reasoning and code generation. This is the part you’re paying for, in theory.

The first three categories are necessary overhead. The fourth is the actual work. In a typical session, the work is a small fraction of the total, and the overhead is most of it.

The issue is that without a structured map of your codebase, the agent has no choice but to discover everything from scratch every time. It reads a controller to understand what it does, then a model, then a service, then a test file, and then it greps for where a function is called. Each of those operations puts more tokens into context, and most of those tokens won’t be relevant to the actual change being made.

Anthropic’s own guidance on prompt caching and context engineering makes a similar point: the gains from optimizing what’s in the context window often dwarf the gains from optimizing the model itself. Bigger context windows are not the answer. Sharper, more relevant context is.

What codebase intelligence does to that math

Scrubby gives the agent a different way to get context. Instead of grepping and reading until it has enough information to proceed, the agent issues a structured query like:

“What does this file do, what domain is it in, and what conventions apply?” (scrubby_review)
“What are the architectural domains of this repo and how are they connected?” (scrubby_get_network)
“Before I commit, are there files that historically change with these but are missing from my changeset?” (scrubby_review_changeset)

These return structured, dense answers. A scrubby_review call gives the agent a file summary, the domain, the relevant conventions, and the connected domains in a few hundred tokens. To get the same picture by reading code directly, the agent might need to read a dozen files and grep for several patterns, easily 10 to 20× the tokens for a fuzzier result.

This is the same pattern Anthropic and the rest of the industry have been advocating for under the banner of “context engineering”: replacing exploration with retrieval of pre-structured knowledge instead of expanding the context window.

The compound effect on long sessions

The savings get larger the longer the session runs. Here’s why.

In a long agent session, the conversation history dominates the token cost. Every turn, the entire history of tool results, file contents, and reasoning is re-sent to the model. If the agent did 15 read_file operations on its way to writing one function, those 15 file contents are now permanently in context. Every subsequent turn pays for them again.

When the agent uses Scrubby instead, the structured queries return compact answers. The session’s conversation history stays small. The cost-per-turn doesn’t balloon over time. We’ve seen agents using Scrubby finish complex multi-file tasks at a meaningful fraction of the token cost they incur without it, sometimes 40 to 95% lower depending on the task and codebase, in line with what other code-knowledge-graph approaches have reported.

Cache hits, not cache misses

There’s a second mechanism at play here. Anthropic’s prompt caching gives a 90% discount on tokens that hit the cache. But cache hits require stable prefixes, meaning the same prompt text in the same order across requests. When an agent’s context is full of variable file content from arbitrary read operations, cache hits are sporadic. When the agent’s context is stabilized around a small set of structured Scrubby queries, the cache hits more often.

This isn’t a marginal effect. For teams running coding agents at scale, it’s the difference between AI tooling being a meaningful line item and AI tooling being a small one. It’s also the difference between hitting your model’s rate limits regularly and not having to think about them.

Why “just expand the context window” doesn’t work

You’d think bigger context windows would solve this. They don’t, for two reasons.

First, more tokens isn’t the same as more relevant tokens. Studies on long-context LLM performance consistently show degradation as context grows, where the model’s ability to reason over the contents drops even when it can technically fit them all. Stuffing the context window doesn’t make the model smarter. It makes the model less attentive to what matters.

Second, the cost scales with what you put in. Even on a million-token context window, you pay per token. Token waste is token waste, regardless of how much room you have to waste it in.

The actual fix is retrieval, which means giving the agent a way to ask precise questions and get precise answers. That’s what Scrubby’s MCP server is. It’s codebase intelligence as retrieval-augmented generation, with the retrieval layer designed specifically for codebases.

What this looks like in your bill

If you’re running an AI coding workflow at any scale, codebase intelligence shows up in your usage data within a few weeks of being adopted.

The places it shows up:

Average tokens per task drop substantially. Tasks that used to take 15 file reads now take 1 or 2 Scrubby queries plus a couple of targeted reads.
Conversations stay shorter. The agent gets to the right answer in fewer turns because the early turns aren’t burned on orientation.
Cache hit rates climb. Stable prefixes around structured queries make caching meaningfully more effective.
Rate limit hits drop. This one matters operationally, since fewer interruptions means more flow.

For a team using Claude Code or Cursor heavily, the dollar savings can be material. The deeper benefit is that the agent is doing better work in less time, because it’s not spending most of its capacity on the wrong thing.

Why the savings matter beyond cost

Here’s the thing about token efficiency: cost is almost the least interesting part of it.

When an agent isn’t burning tokens on exploration, several other things happen at the same time. Its outputs get sharper, because a focused context produces better reasoning than a bloated one (which is well-documented across long-context evaluations). Complex multi-step tasks become tractable, because tasks that would have hit a context limit halfway through can now actually complete.

In other words, codebase intelligence saves you tokens and makes the tokens you do spend more valuable. That’s a rare combination in any optimization.

The takeaway

Token consumption isn’t really a token problem. It’s a context problem dressed up as a token problem. AI coding agents waste tokens because they’re forced to discover the same things about your codebase, over and over again, every session.

Scrubby breaks that loop. By giving the agent a structured, queryable representation of your codebase, exploration gets replaced with retrieval, conversation history stays small, and cache hits go up. The token savings follow automatically, and the work the agent does with the tokens it spends gets meaningfully better at the same time.

If you’re feeling the cost of AI coding agents (financially, operationally, or in terms of how much they actually accomplish per task), codebase intelligence is one of the few interventions that improves all three at once.

Sources:

Ready to give your AI agents full codebase context?