Use one spec per shippable behavior change. In a 2023 McKinsey survey of 1,994 business leaders across 105 countries, 42% said they were regularly using generative AI tools in at least one function, and that makes small, clear specs more valuable because agents handle bounded work better than sprawling milestones.
A lot of the backlash against spec-driven development is deserved. People call it "waterfall with markdown", a productivity trap, frozen docs, token-burning ceremony. That happens when the spec is the wrong size. Too big, and your coding agent gets lost in mushy goals. Too small, and it loses the context that makes the change coherent.
The useful rule is simple. Write the smallest spec you would still review as one independent change. Not one milestone by default. Not one doc per tiny task. One shippable behavior change.
That tension shows up constantly when you build with Cursor, Claude Code, Codex, or Gemini. One Reddit user trying to make spec-driven development work with AI agents put it plainly: "The biggest hurdle I'm facing is determining the right level of detail. Too vague and the AI hallucinates, too detailed and I might as well write the code myself." That is the problem, not whether specs are good in the abstract. If you've hit the vibe coding wall and prompting wall that pushed people toward SDD, granularity is usually the bottleneck.
Table of Contents
- Why Spec Granularity Is The New Bottleneck
- The Three Levels of Spec Granularity
- The Default Rule One Spec Per Shippable Change
- Exceptions When to Zoom Out
- A Practical Spec Template for AI Agents
- Building Your Spec-Driven Workflow
Why Spec Granularity Is The New Bottleneck
Spec quality matters less than is often assumed. Spec size matters more.
If the doc covers a whole release, your AI agent has no clean target. Acceptance criteria turn into stakeholder language. Testing gets fuzzy. The agent starts making guesses across too many files, too many assumptions, too many hidden dependencies.
If the doc is too atomized, you get the opposite failure. You split one coherent behavior across five mini-specs. The agent follows each one precisely and misses the underlying intent.
Practical rule: If you wouldn't review the resulting code as one PR-sized change, the spec is probably the wrong size.
Many teams make a common error: they copy human-team requirements habits into AI-agent workflows. Humans can infer missing context from meetings, code history, and hallway chat. Agents can't. They need boundaries, not ceremony.
The requirements literature gives a better target than most AI tooling discourse does. IREB defines granularity as the degree of detail in a specification and recommends a level that is the "best reference for an unambiguous definition of software scope" in its article on functional requirements and their levels of granularity. That's the right standard. Not detailed for its own sake. Detailed enough that the scope is unambiguous.
The Three Levels of Spec Granularity
You really have three useful levels. Everything else is a variant.

Feature spec
A feature spec defines one user-visible or system-visible behavior change you can ship and verify on its own.
Example: add a forgot-password link and working reset flow entry point on the login screen.
Subsystem spec
A subsystem spec covers one bounded area of the system where several related changes need shared context.
Example: redesign the entire password recovery subsystem, including token handling, email templates, audit logging, and admin reset rules.
A house analogy works here. This is the plumbing plan, not one faucet.
A useful reference point for boundaries is Addy Osmani's guide to writing a good spec for AI agents, especially his three-tier boundary system. It pushes you to define what the agent can always do, what needs approval, and what it must not do.
After you've got the mental model, the short video below is worth a watch.
Milestone spec
A milestone spec defines an outcome, not an implementation slice.
Example: ship V1 authentication for beta launch.
That can be useful for planning. It's usually bad as the direct input to an AI coding agent.
| Level | Best for | Risk |
|---|---|---|
| Feature | implementation and testing | too narrow if change is cross-cutting |
| Subsystem | coordinated changes in one bounded area | can sprawl if scope isn't capped |
| Milestone | planning and sequencing | too vague for direct coding |
The Default Rule One Spec Per Shippable Change
Default to feature-level specs.

The best argument for this isn't taste. It's delivery physics. In the 2023 DORA research, elite teams were defined by deployment performance of more than 1,000 deployments per year, while low performers deployed between zero and one times per month. That work also linked smaller batch sizes with better flow and lower risk, as summarized in this piece on granular stats and smaller delivery batches.
A feature spec maps cleanly to a small batch. A milestone spec usually doesn't.
That matters even more with AI coding agents. The broader the spec, the more likely the agent is to:
- Invent glue logic between unrelated changes
- Blur acceptance criteria into vague "done" states
- Touch too many files in one pass
- Hide mistakes inside a broad implementation sweep
If you want the shortest version of what spec-driven development actually is, it's this: you give the agent a bounded target with clear validation, not a big dream with implied steps.
What counts as one spec
Use one spec when the change has:
- One reviewable intent. The code review has one main question.
- One coherent acceptance surface. You can state what success looks like without mixing unrelated outcomes.
- One rollback story. If it goes wrong, you know what to revert.
- One obvious test boundary. The change can be verified as a unit.
A good feature spec is usually the smallest thing you can name without saying "and also."
Three concrete examples
Small change Add keyboard shortcut support to open command palette.
One spec. Clear behavior. Easy tests.Vertical slice Let new users sign up, verify email, and land on onboarding screen.
Still one spec if the whole flow is one shippable behavior. Split it only if you'd review and ship the pieces independently.Cross-cutting refactor Replace old auth token parsing across API middleware, jobs, and admin tools.
Not a feature spec. That probably wants subsystem scope.
Exceptions When to Zoom Out
Rules are useful because they have sharp exceptions.

When subsystem beats feature
Use a subsystem spec when the agent needs shared architectural context to avoid local optimizations that break the whole change.
Common cases:
- Cross-cutting refactor. Example: moving from one data access pattern to another across several modules.
- Shared contract redesign. Example: changing validation, storage, and serialization rules together.
- Core platform behavior. Example: reworking permissions, caching, or background job execution.
In systems engineering terms, this matches the split between high-level and detailed design. FHWA guidance distinguishes defining WHAT subsystems must do from HOW components work, and that maps neatly here in its section on requirements traceability and design levels. Milestones and subsystem framing help with the WHAT. Feature specs help the agent execute the HOW.
When milestone beats both
Use a milestone spec when detail is still moving and the primary task is planning, sequencing, and risk control.
Good milestone cases:
- Early product scaffolding
- A beta launch target
- A broad outcome with several unknown implementation paths
Milestone specs are not where you stop. They're where you start. You use them to define outcome boundaries, then break implementation into subsystem or feature specs.
Don't hand a milestone spec straight to a coding agent and expect clean results. That's how you get vague code tied to vague goals.
The simple split is this:
- Milestone specs reduce planning ambiguity
- Subsystem specs reduce architectural thrash
- Feature specs reduce implementation rework
A Practical Spec Template for AI Agents
A good spec for an AI coding agent needs structure, not volume.

The template
Use this:
TL;DR
One sentence. What is changing?Scope boundaries
Use Addy Osmani's three buckets from his spec guide:- Always do
Safe defaults the agent should follow - Ask first
Changes that need approval - Never do
Hard boundaries
- Always do
Subtasks
Short checklist. Logical order, not essay prose.Acceptance criteria
Observable outcomes. No vague "works well" language.Assumptions and risks
What you think is true. What might break.Validation scenarios
Manual checks or test cases that prove the change.
If you want a structured starting point, a plain product requirements document template is better than a blank markdown file.
For builders looking at the broader craft of AI agent development, the useful lesson is the same. Agents perform better when the artifact they consume has explicit constraints, explicit tests, and explicit failure boundaries.
Adjust granularity as your process matures
Birgitta Böckeler describes a maturity path from spec-first to spec-anchored to spec-as-source in her spec-driven development analysis on Martin Fowler. That matters for granularity.
Early on, you'll often write coarser specs because your system and workflow are still unstable. Later, you can go finer because you trust your boundaries, tests, and review process.
That's normal. What fails is pretending your process is mature when it isn't.
Building Your Spec-Driven Workflow
You don't need a giant framework to make this work. You need a repeatable loop.

The manual version
Plenty of solo builders do this with:
- Markdown files in
/specs - GitHub Spec Kit
- OpenSpec
- Kiro
- Traycer
- Vibe Kanban
- BMAD-style workflows
That route works. It also creates drift if you're sloppy.
The loop is straightforward:
- Write the spec
- Hand it to Cursor, Claude Code, Codex, or Gemini
- Review output against acceptance criteria
- Update the spec if the assumptions changed
- Ship or split further
For article critique, the anti-spec crowd is partly right. If you keep specs frozen, they rot. If you over-spec, you are writing pseudocode with extra steps. If you skip acceptance criteria, you're just prompting with nicer formatting.
The opinionated version
If you want less manual setup, use a tool that focuses on spec generation and repo context, not magical claims about full automation.
Tekk.coach fits that narrower job. It connects to your GitHub repo, reads the codebase for context, runs a structured interview, and produces a codebase-aware spec you then hand to Cursor, Claude Code, Codex, or Gemini yourself. It doesn't create PRs. It doesn't orchestrate external coding agents. The async piece is the CTO loop, one tick per workspace, emitting at most one proposal or question per tick.
That narrower shape is the right one for solo builders. Better spec in. Better agent output out.
If you want a cleaner spec workflow without more ceremony, try Tekk.coach. Connect your GitHub repo. Describe the problem. Get a structured spec. Ship.
Part of the Spec-Driven Development pillar — a 52-page honest playbook on shipping with AI coding agents.

