Token Budgets, Model Selection, and the Art of AI Pair Programming

After months of daily AI pair programming, patterns emerge. Not the kind you read about in launch posts. The messy, practical ones about when to start a new session, which model to use for what, and how to stop the context window from becoming a liability.

These are the patterns I keep coming back to.

The Context Window Is a Resource, Not a Feature

Every AI coding session has a finite context window. Early on, the assistant has perfect recall of your codebase, your instructions, and the conversation. By the end of a long session, it's compressing earlier messages, losing nuance, and sometimes contradicting its own earlier decisions.

Put simply: session freshness matters more than session length.

Signs your session has degraded:

The assistant re-reads files it already read
Suggestions contradict earlier architectural decisions
It starts hallucinating function signatures that don't exist
You find yourself repeating instructions

When this happens, start fresh. Don't grind through a degraded session. The time you spend correcting errors exceeds the time you'd spend re-establishing context.

Model Selection Is a Tactical Decision

Not every task needs your most capable (and most expensive) model. Here's the split that works for me:

High-capability model (e.g., Opus)

Architecture and planning: "How should I structure this feature?"
Complex refactors: understanding interconnected changes across files
Debugging subtle issues: race conditions, type system edge cases
Code review: catching logic errors, suggesting better patterns
Writing: blog posts, documentation, commit messages that need thought

Fast model (e.g., Sonnet)

Implementation: writing code from a clear plan
Boilerplate: test setup, repetitive CRUD, config files
Quick lookups: "What's the API for this library?"
Simple edits: rename a variable, fix a typo, add a type annotation
Running commands: build, test, lint, git operations

The split that pays off: plan with the expensive model, execute with the fast one. An Opus session that produces a clear 10-step plan can hand off to Sonnet sessions that burn through implementation cheaply.

Your System Prompt Is a Token Tax

Every message in your conversation pays the cost of your system prompt. If your project instructions file is 500 lines, that's hundreds of tokens taxed on every single interaction.

Strategies for keeping it lean:

Link, don't inline. Instead of pasting your coding standards into the system prompt, reference a file: "Follow conventions in CONVENTIONS.md."
Be specific, not comprehensive. Don't document every possible scenario. Cover the 5 things the AI gets wrong most often.
Prune regularly. Instructions that made sense three weeks ago might be irrelevant now. Treat your project config like code. Review and trim.
Separate concerns. One focused prompt beats a sprawling document. Use per-directory or per-feature config files if your tool supports them.

MCP Servers Are Token-Hungry

Tool integrations (GitHub, Slack, databases via MCP servers) are incredibly powerful, but each tool definition eats context. If you have 30 MCP tools registered, every message pays for their schemas, even if you're not using them.

Be deliberate:

Only enable the integrations you need for the current task
Disable heavy integrations (Slack, Jira) when doing pure coding work
If your tool supports it, use deferred tool loading so schemas are only loaded when needed

Session Patterns That Work

The "Scout and Build" Pattern

Start a session with the capable model
Explore the codebase, read relevant files, understand the problem
Produce a written plan (in a file, not just in conversation)
End the session
Start a new session with the fast model, pointing it at the plan
Execute step by step

This costs less than a single long session and produces better results because the implementation session starts with a fresh context and a clear brief.

The "Checkpoint" Pattern

For long tasks, periodically ask the assistant to summarise the current state:

What's been done
What's remaining
Any decisions made

Save this to a file. If the session degrades, you can start fresh and hand the checkpoint file to the new session. It's like a save game.

The "Narrow Scope" Pattern

Instead of "refactor the entire auth module," break it into:

"Refactor the token refresh logic in auth.ts"
"Update the tests for the new token refresh logic"
"Update the types in auth.types.ts to match"

Smaller scope = less context needed = better results = cheaper.

Cost Mental Model

A rough way to think about AI coding costs:

Embedding/search: Effectively free (fractions of a cent)
Fast model implementation: Cheap enough to not think about per-task
Capable model planning: Worth the premium for complex decisions
Long degraded sessions: The most expensive option (you pay for tokens AND waste your own time fixing bad output)

The cheapest session is a short, focused one that gets it right the first time. The most expensive is a marathon session where you're fighting the context window.

The Uncomfortable Truth

Most of effective AI pair programming comes down to discipline, not technique. Start new sessions more often than feels necessary. Freshness beats continuity. Write plans in files, not in conversation, because plans survive session boundaries. Match the model to the task instead of defaulting to the most capable one. Keep your system prompt lean. And break big tasks into small, focused sessions.

The context window is your most precious resource. The leverage comes from knowing when to push and when to reset.