AI Agents for Unit Test Generation: A Practical Guide

"AI will replace developers" is the wrong framing. The better question: what does engineering look like when AI handles the tedious parts?

I built agentic workflows that automate unit test generation across a large micro-frontend codebase. Not as a replacement for thinking about tests, but as a force multiplier, handling the mechanical parts so engineers can focus on testing behaviour and edge cases.

Why Unit Tests?

Unit test generation is an ideal starting point for AI-augmented engineering:

High volume, moderate complexity: most codebases need hundreds of test files
Clear input/output: source file in, test file out
Easy to validate: the tests either pass or they don't
Low risk: bad tests don't break production; they just fail in CI

The Agentic Architecture

A single LLM call with "write tests for this file" produces mediocre results. The agent architecture that worked:

Step 1: Analyse

The agent reads the source file and its imports, building a dependency graph. It identifies:

Exported functions and components
External dependencies that need mocking
Existing type definitions
Related test files (if any) for style matching

Step 2: Plan

Before writing any tests, the agent produces a test plan:

## Test Plan: useAuth.ts
 
### Functions to test:
- `useAuth` hook (3 scenarios)
- `validateToken` (2 scenarios)
 
### Mocking required:
- `@/lib/api`: mock `fetchUser` and `refreshToken`
- `next/navigation`: mock `useRouter`
 
### Edge cases identified:
- Expired token with valid refresh token
- Network failure during refresh
- Missing token in storage

This plan is reviewed by the engineer before the agent writes any code. The human stays in the loop for what to test; the agent handles how.

Step 3: Generate

The agent writes tests following the project's conventions:

Arrange-Act-Assert pattern
React Testing Library for component tests
Vitest as the runner
Matches existing file naming and import patterns

Step 4: Validate

The agent runs the generated tests. If any fail, it reads the error output and iterates, typically fixing mock setup issues or assertion mismatches. This loop runs up to three times before flagging for human review.

Results

After rolling this out across multiple teams:

Significant coverage improvement on business logic within the first quarter
Time to write tests dropped substantially. Engineers spent time reviewing and refining instead of writing from scratch
Generated tests surfaced real bugs that existing tests had missed

What Doesn't Work

This isn't a silver bullet. The agent struggles with:

Integration tests: too many moving parts, too much implicit context
Tests that require domain knowledge: the agent doesn't know your business rules unless you tell it
Highly stateful components: complex Redux/XState flows need human design
Tests for tests' sake: the agent will happily test trivial getters if you let it

The "AI-Augmented" Philosophy

The agent is best at mechanical work: figuring out imports, setting up mocks, writing the boilerplate assertion structure. The human is best at deciding what's worth testing and what constitutes correct behaviour.

We don't let the agent commit directly. Every generated test goes through the same code review process as handwritten code. The engineer reviews, adjusts edge cases, and approves. The agent is a first-draft writer, not an autonomous actor.

Getting Started

If you want to build something similar:

Start with a single, well-tested file as your reference. The agent needs examples of your conventions.
Use the plan step. It's tempting to skip straight to generation, but the plan catches misunderstandings early.
Invest in your prompt's context window. Include your test utilities, custom matchers, and mock helpers.
Measure coverage improvement, not just generation speed. The goal is better tests, not faster tests.

AI-augmented engineering isn't about removing the human. It's about removing the friction so the human can focus on the interesting problems.