Stop Prompting, Start Orchestrating: How to Build a Multi-Agent Coding Pipeline with Claude Code
Most developers use AI coding agents as glorified autocomplete. Here's how to architect a 6-stage pipeline with specialized sub-agents that plan, implement, test, and review code — with humans in the loop where it matters.
Anyone who has shipped AI features in production knows how quickly things break down when context is poorly defined. At scale, or even across a few dozen iterations, hallucinations multiply fast in generalist, multi-function bots.
The solution that consistently works in production AI architectures is decomposition: splitting work into highly specialized sub-agents, each responsible for one small, well-defined task. There's no reason AI coding agents should be any different.
If you haven't experimented with a more structured framework for Claude Code, I'd strongly recommend trying Superpowers, get-shit-done, or bmad. They'll likely change how you think about AI-assisted development, especially when paired with project-specific CLAUDE.md files.
That said, this article is for those ready to go further and build an optimized multi-stage pipeline tailored for your project.
Prerequisites: Familiarity with Claude Code fundamentals, including slash commands, skills, and sub-agents. Knowledge of git worktrees is a bonus, but not required.
The Orchestrator Agent
When developing conversational bots with LLMs, it's common to have a coordinator agent that manages sequencing, decides when to hand off to specialized agents, and knows when to pause for human input. The same pattern applies here.
The orchestrator has exactly one job: know the sequence and enforce checkpoints, including human checkpoints. Yes, human intervention. You still need to know wtf is happening behind the scenes to be able to adapt and make things right as you advance in implementation.
# Pipeline: Full-Stack Feature Development Orchestrator
You are an **orchestrator agent**. Your job is to drive a complete feature
implementation pipeline through discrete stages, each executed by specialized
agents. You coordinate work, enforce quality gates, and pause at checkpoints
for user review.
## Input
The user will describe a feature or task after invoking `/pipeline`. Capture
their description as $ARGUMENTS.
If $ARGUMENTS is empty, ask the user to describe the feature they want to
implement and wait for their response before proceeding.
Each stage in the pipeline runs sequentially, with some stages supporting iteration loops, which I'll cover below.
Stage 0: Clarifying Questions
This is a stage I added recently, inspired by experimenting with Obra/Superpowers. Before any planning begins, the pipeline stops and asks you questions.
It sounds simple, but the impact is significant. When writing an initial prompt, you're already biased with your own mental model, which means you're not well-positioned to spot your own gaps. Delegating that review to an AI surfaces branches and edge cases you likely haven't considered. In practice, I get around 10 clarifying questions per pipeline run, and virtually all of them represent assumptions the AI would have silently made otherwise.
### STAGE 0: Clarifying Questions
Before creating any plan, you **must** ask the user clarifying questions.
Do NOT skip this stage.
**Your task**:
1. Read the user's feature description ($ARGUMENTS).
2. Launch an **Explore agent** (subagent_type: "Explore", thoroughness: "medium")
to scan the codebase for existing patterns, models, routers, and components
related to the feature.
3. Identify ambiguities, missing details, and unstated assumptions across these dimensions:
- **Scope**: What's included vs. excluded? Are there overlapping features?
- **Data model**: Which entities are involved? Should existing models be reused or extended?
- **User flows**: Who are the actors? Are there multiple roles with different permissions?
- **Business rules**: Validation rules, constraints, edge cases?
- **UI/UX**: Specific layout, navigation, or component style preferences?
- **Auth**: Is authentication required? Are any actions restricted?
- **Integration**: Dependencies on existing features?
- **Scope/MVP**: Should this be built incrementally?
4. Present questions in a numbered list, grouped by category.
5. **Wait for the user's answers.** Do NOT proceed to Stage 1 until they respond.
6. Incorporate answers into the feature description passed to Stage 1.
Even for detailed requests, always ask at least 2-3 questions to confirm understanding.
One important detail: step 2 instructs the agent to explore your codebase before asking questions. This is where CLAUDE.md files earn their value. They tell the agent exactly where to look, rather than scanning the entire repo.
Stage 1: Spec-Driven Planning
This stage has three components.
The Plan Agent uses Spec-Driven Development to define the full scope of the implementation. If you're unfamiliar with the concept, Birgitta Böckeler in Martin Fowler's blog has a good writeup on it.
Additionally, you can also read about Claude Code's built-in sub-agents; the Plan agent is purpose-built for this kind of structured output.
### STAGE 1: Spec-Driven Planning
Launch a **Plan agent** (subagent_type: "Plan") with the original feature
description plus all clarifying answers from Stage 0:
You are a spec-driven planner for a full-stack feature in the monorepo.
**Feature request**: {$ARGUMENTS + Stage 0 answers}
**Your task**:
1. Analyze the codebase to understand existing patterns, models, routers,
and UI components.
2. Produce a detailed implementation spec covering:
- **Summary**: What the feature does and why.
- **Data model changes**: New or modified Mongoose schemas in packages/database.
- **API contract**: New tRPC procedures in packages/trpc, with Zod input/output schemas.
- **Backend logic**: Controllers, business rules, and data flow.
- **Authentication** (if applicable): Use Better Auth with MongoDB/Mongoose adapter.
Include auth methods, plugins, session strategy, route handler setup,
environment variables, and auth UI pages.
- **Frontend components**: Pages, components, hooks, and data fetching in apps/web.
All UI must use shadcn/ui with Tailwind CSS.
- **File manifest**: Every file to be created or modified, with its purpose.
- **Edge cases and validation rules**.
- **Testing strategy**: Integration and E2E test coverage.
3. Follow existing patterns: controller injection via tRPC context, Zod validation,
publicProcedure, superjson transformer.
4. Output the spec as structured markdown.
Alongside the main plan agent, I run two additional agents in parallel: a UX reviewer and a UI reviewer. These evaluate the plan against established design patterns and surface assumptions the planner might have made. If you have a designer and a defined system, replace these with agents that enforce your actual specs; they'll be far more precise than generalist reviewers.
After planning completes, there's a human checkpoint for feedback. This loop repeats until you approve the plan and tell the orchestrator to proceed.
Stage 2: Contracts
Before writing any real implementation code, this stage defines the contracts: schemas, models, and TypeScript types that both frontend and backend will share.
Establishing shared contracts first is what makes parallel frontend/backend development possible. Both sides can build against agreed-upon interfaces without waiting on each other.
tRPC is a natural fit here, since it shares types across the full stack without any extra ceremony.
### STAGE 2: Contracts Writer
Launch a **general-purpose agent**:
You are a contracts writer. You ONLY define data models, schemas, and
TypeScript types. You do NOT implement business logic, controllers, or routes.
**Spec**: {spec from Stage 1}
**Your task**:
1. In packages/database/src/schemas/, create or update Zod schemas for all
input/output types in the spec.
2. In packages/database/src/models/, create or update Mongoose models matching
the data model.
3. Update packages/database/src/index.ts to export all new schemas and models.
4. In packages/trpc/src/routers/, create procedure signatures only. Bodies
should be stubs (e.g., throw new Error("Not implemented")) — enough to compile.
5. Update packages/trpc/src/index.ts and context.ts accordingly.
**Do NOT**:
- Implement controller methods or business logic
- Write database queries
- Touch any files in apps/
After writing contracts, run: pnpm build --filter @app/database --filter @app/trpc
This stage ends with another human checkpoint. The contracts are the foundation of your implementation, so it's worth taking the time to review them carefully before moving on.
Stage 3: Implementation
This is where the actual building happens. Two agents run in parallel, one for backend and one for frontend, both working from the shared contracts defined in Stage 2.
### STAGE 3: Implementation (Parallel)
Launch two agents simultaneously:
**Agent 3A — Backend Implementation** (general-purpose agent)
**Agent 3B — Frontend Implementation** (general-purpose agent)
### CHECKPOINT 3: Implementation Review
After both complete, present results and ask:
> Both backend and frontend implementations are complete.
>
> **Backend**: {summary}
> **Frontend**: {summary}
>
> You can:
> - Check the running app with `pnpm dev`
> - Request changes to specific files
> - Say **"proceed"** to continue to testing
Wait for user response. If feedback is given, re-run the relevant
agent(s) with the feedback.
Be deliberate here: the more precisely you describe your architecture in these agent prompts, the better the output. This is also a good place to build in self-improvement. You can instruct Claude to update the pipeline itself when new patterns are introduced to the project.
Stages 4 and 5: Automated Testing
The old excuse that automated tests take too much time is no longer valid. Writing tests with AI assistance is cheap, and their value has never been higher. Don't skip them.
My rule of thumb for test strategy:
- Integration tests for any layer that interacts with your data or domain layer. No mocks; run the actual flow and assert real behavior. I use Vitest for this, paired with Supertest for REST-layer testing.
- E2E tests for all frontend components. Playwright is my tool of choice.
Before writing tests, a Plan agent first defines what needs to be covered based on the actual implementation. Then two agents, one for integration and one for E2E, execute in sequence.
### STAGE 3.5: Test Planning
Launch a **Plan agent** to define test coverage based on the completed implementation.
### STAGE 4: Integration Tests
Launch a **general-purpose agent** to write integration tests per the test plan.
### STAGE 5: E2E Tests
Launch a **general-purpose agent** to write Playwright E2E tests per the test plan.
Stage 6: Code Reviews
Three reviewers run in parallel, covering Best Practices, Performance, and Security, and produce a consolidated report. Critically, none of these agents touch the code. They review and report only. This is an intentional constraint: agents that do one thing at a time produce more reliable output.
On the security front, Nicholas Carlini's (security researcher at Anthropic) recent talk at the Unprompted conference (watch here) is worth your time. The cost of finding CVEs with LLMs is dropping fast, and the risk surface is growing. Automated security review is quickly becoming essential, not optional.
### STAGE 6: Code Reviews (Parallel)
Launch three review agents simultaneously. They must NOT write or edit code.
**Agent 6A — Best Practices Reviewer**
**Agent 6B — Performance Reviewer**
**Agent 6C — Security Reviewer**
### CHECKPOINT 4: Code Review Results
After all three complete, present a consolidated report:
> Three code reviews completed.
>
> **Best Practices**: {X issues}
> **Performance**: {X issues}
> **Security**: {X issues}
>
> {Consolidated report, critical/high issues highlighted}
>
> Reply with:
> - **"fix all"** — Address all issues
> - **"fix critical"** — Fix critical/high severity only
> - Specific feedback on which issues to fix
> - **"skip"** — Proceed without changes
If fixes are requested, re-run the relevant Stage 3 agents, then re-run
Stage 6 reviewers to verify. Repeat until satisfied.
Final Thoughts
New code is cheap now. Theo (t3.gg) put it well in a recent video: the software development funnel is shifting, and lines of code are no longer the bottleneck. I think that's right, though it's more true for greenfield projects than for organizations still operating on legacy architectures. The industry will take time to fully adjust.
What remains constant, and becomes more important in this new world, is guardrails. The more constraints you build into your agents, the better the output. Make them write tests. Make them review each other's work from different angles. And keep humans in the loop at key decision points. The job isn't disappearing; it's shifting from writing code to coordinating the systems that write it.
The full pipeline is available here.