How Granular Should a Spec Be — One Per Feature, Subsystem, or Milestone

Use one spec per shippable behavior change. In a 2023 McKinsey survey of 1,994 business leaders across 105 countries, 42% said they were regularly using generative AI tools in at least one function, and that makes small, clear specs more valuable because agents handle bounded work better than sprawling milestones.

A lot of the backlash against spec-driven development is deserved. People call it "waterfall with markdown", a productivity trap, frozen docs, token-burning ceremony. That happens when the spec is the wrong size. Too big, and your coding agent gets lost in mushy goals. Too small, and it loses the context that makes the change coherent.

The useful rule is simple. Write the smallest spec you would still review as one independent change. Not one milestone by default. Not one doc per tiny task. One shippable behavior change.

That tension shows up constantly when you build with Cursor, Claude Code, Codex, or Gemini. One Reddit user trying to make spec-driven development work with AI agents put it plainly: "The biggest hurdle I'm facing is determining the right level of detail. Too vague and the AI hallucinates, too detailed and I might as well write the code myself." That is the problem, not whether specs are good in the abstract. If you've hit the vibe coding wall and prompting wall that pushed people toward SDD, granularity is usually the bottleneck.

Why Spec Granularity Is The New Bottleneck
The Three Levels of Spec Granularity
The Default Rule One Spec Per Shippable Change
- What counts as one spec
- Three concrete examples
Exceptions When to Zoom Out
- When subsystem beats feature
- When milestone beats both
A Practical Spec Template for AI Agents
- The template
- Adjust granularity as your process matures
Building Your Spec-Driven Workflow
- The manual version
- The opinionated version

Why Spec Granularity Is The New Bottleneck

Spec quality matters less than is often assumed. Spec size matters more.

If the doc covers a whole release, your AI agent has no clean target. Acceptance criteria turn into stakeholder language. Testing gets fuzzy. The agent starts making guesses across too many files, too many assumptions, too many hidden dependencies.

If the doc is too atomized, you get the opposite failure. You split one coherent behavior across five mini-specs. The agent follows each one precisely and misses the underlying intent.

Practical rule: If you wouldn't review the resulting code as one PR-sized change, the spec is probably the wrong size.

Many teams make a common error: they copy human-team requirements habits into AI-agent workflows. Humans can infer missing context from meetings, code history, and hallway chat. Agents can't. They need boundaries, not ceremony.

The requirements literature gives a better target than most AI tooling discourse does. IREB defines granularity as the degree of detail in a specification and recommends a level that is the "best reference for an unambiguous definition of software scope" in its article on functional requirements and their levels of granularity. That's the right standard. Not detailed for its own sake. Detailed enough that the scope is unambiguous.

The Three Levels of Spec Granularity

You really have three useful levels. Everything else is a variant.

A diagram illustrating the three levels of specification granularity: milestone, subsystem, and feature specifications.

Feature spec

A feature spec defines one user-visible or system-visible behavior change you can ship and verify on its own.

Example: add a forgot-password link and working reset flow entry point on the login screen.

Subsystem spec

A subsystem spec covers one bounded area of the system where several related changes need shared context.

Example: redesign the entire password recovery subsystem, including token handling, email templates, audit logging, and admin reset rules.

A house analogy works here. This is the plumbing plan, not one faucet.

A useful reference point for boundaries is Addy Osmani's guide to writing a good spec for AI agents, especially his three-tier boundary system. It pushes you to define what the agent can always do, what needs approval, and what it must not do.

After you've got the mental model, the short video below is worth a watch.

Milestone spec

A milestone spec defines an outcome, not an implementation slice.

Example: ship V1 authentication for beta launch.

That can be useful for planning. It's usually bad as the direct input to an AI coding agent.

Level	Best for	Risk
Feature	implementation and testing	too narrow if change is cross-cutting
Subsystem	coordinated changes in one bounded area	can sprawl if scope isn't capped
Milestone	planning and sequencing	too vague for direct coding

The Default Rule One Spec Per Shippable Change

Default to feature-level specs.

An infographic explaining the default rule of using one feature spec per shippable change for AI development.

The best argument for this isn't taste. It's delivery physics. In the 2023 DORA research, elite teams were defined by deployment performance of more than 1,000 deployments per year, while low performers deployed between zero and one times per month. That work also linked smaller batch sizes with better flow and lower risk, as summarized in this piece on granular stats and smaller delivery batches.

A feature spec maps cleanly to a small batch. A milestone spec usually doesn't.

That matters even more with AI coding agents. The broader the spec, the more likely the agent is to:

Invent glue logic between unrelated changes
Blur acceptance criteria into vague "done" states
Touch too many files in one pass
Hide mistakes inside a broad implementation sweep

If you want the shortest version of what spec-driven development actually is, it's this: you give the agent a bounded target with clear validation, not a big dream with implied steps.

What counts as one spec

Use one spec when the change has:

One reviewable intent. The code review has one main question.
One coherent acceptance surface. You can state what success looks like without mixing unrelated outcomes.
One rollback story. If it goes wrong, you know what to revert.
One obvious test boundary. The change can be verified as a unit.

A good feature spec is usually the smallest thing you can name without saying "and also."

Three concrete examples

Small change Add keyboard shortcut support to open command palette.
One spec. Clear behavior. Easy tests.
Vertical slice Let new users sign up, verify email, and land on onboarding screen.
Still one spec if the whole flow is one shippable behavior. Split it only if you'd review and ship the pieces independently.
Cross-cutting refactor Replace old auth token parsing across API middleware, jobs, and admin tools.
Not a feature spec. That probably wants subsystem scope.

Exceptions When to Zoom Out

Rules are useful because they have sharp exceptions.

An infographic showing when to use subsystem or milestone specifications for broad, far-reaching product projects.

When subsystem beats feature

Use a subsystem spec when the agent needs shared architectural context to avoid local optimizations that break the whole change.

Common cases:

Cross-cutting refactor. Example: moving from one data access pattern to another across several modules.
Shared contract redesign. Example: changing validation, storage, and serialization rules together.
Core platform behavior. Example: reworking permissions, caching, or background job execution.

In systems engineering terms, this matches the split between high-level and detailed design. FHWA guidance distinguishes defining WHAT subsystems must do from HOW components work, and that maps neatly here in its section on requirements traceability and design levels. Milestones and subsystem framing help with the WHAT. Feature specs help the agent execute the HOW.

When milestone beats both

Use a milestone spec when detail is still moving and the primary task is planning, sequencing, and risk control.

Good milestone cases:

Early product scaffolding
A beta launch target
A broad outcome with several unknown implementation paths

Milestone specs are not where you stop. They're where you start. You use them to define outcome boundaries, then break implementation into subsystem or feature specs.

Don't hand a milestone spec straight to a coding agent and expect clean results. That's how you get vague code tied to vague goals.

The simple split is this:

Milestone specs reduce planning ambiguity
Subsystem specs reduce architectural thrash
Feature specs reduce implementation rework

A Practical Spec Template for AI Agents

A good spec for an AI coding agent needs structure, not volume.

A diagram illustrating an AI Agent Blueprint showing key components like inputs, logic, memory, tools, and safety.

The template

Use this:

TL;DR
One sentence. What is changing?
Scope boundaries
Use Addy Osmani's three buckets from his spec guide:
- Always do
  Safe defaults the agent should follow
- Ask first
  Changes that need approval
- Never do
  Hard boundaries
Subtasks
Short checklist. Logical order, not essay prose.
Acceptance criteria
Observable outcomes. No vague "works well" language.
Assumptions and risks
What you think is true. What might break.
Validation scenarios
Manual checks or test cases that prove the change.

If you want a structured starting point, a plain product requirements document template is better than a blank markdown file.

For builders looking at the broader craft of AI agent development, the useful lesson is the same. Agents perform better when the artifact they consume has explicit constraints, explicit tests, and explicit failure boundaries.

Adjust granularity as your process matures

Birgitta Böckeler describes a maturity path from spec-first to spec-anchored to spec-as-source in her spec-driven development analysis on Martin Fowler. That matters for granularity.

Early on, you'll often write coarser specs because your system and workflow are still unstable. Later, you can go finer because you trust your boundaries, tests, and review process.

That's normal. What fails is pretending your process is mature when it isn't.

Building Your Spec-Driven Workflow

You don't need a giant framework to make this work. You need a repeatable loop.

Screenshot from https://tekk.coach

The manual version

Plenty of solo builders do this with:

Markdown files in /specs
GitHub Spec Kit
OpenSpec
Kiro
Traycer
Vibe Kanban
BMAD-style workflows

That route works. It also creates drift if you're sloppy.

The loop is straightforward:

Write the spec
Hand it to Cursor, Claude Code, Codex, or Gemini
Review output against acceptance criteria
Update the spec if the assumptions changed
Ship or split further

For article critique, the anti-spec crowd is partly right. If you keep specs frozen, they rot. If you over-spec, you are writing pseudocode with extra steps. If you skip acceptance criteria, you're just prompting with nicer formatting.

The opinionated version

If you want less manual setup, use a tool that focuses on spec generation and repo context, not magical claims about full automation.

Tekk.coach fits that narrower job. It connects to your GitHub repo, reads the codebase for context, runs a structured interview, and produces a codebase-aware spec you then hand to Cursor, Claude Code, Codex, or Gemini yourself. It doesn't create PRs. It doesn't orchestrate external coding agents. The async piece is the CTO loop, one tick per workspace, emitting at most one proposal or question per tick.

That narrower shape is the right one for solo builders. Better spec in. Better agent output out.

If you want a cleaner spec workflow without more ceremony, try Tekk.coach. Connect your GitHub repo. Describe the problem. Get a structured spec. Ship.

Part of the Spec-Driven Development pillar — a 52-page honest playbook on shipping with AI coding agents.

How Granular Should a Spec Be — One Per Feature, Subsystem, or Milestone

Table of Contents

Why Spec Granularity Is The New Bottleneck