knack
← all posts

Multi-Agent Workflows in Claude Code: Orchestrating Skills, Subagents, and MCP Together

The composition patterns that turn skills, subagents, and MCP into one pipeline. Serial chains, fan-out, context boundaries, and a worked example.

Most people who run Claude Code workflows stop at one good prompt and a CLAUDE.md file. That works until the task gets wide. A bug that spans forty files, a migration across a hundred thousand lines, a security pass that has to touch every route. At that point a single agent in a single context window is the wrong shape for the job, and the real question is how you wire several components into one pipeline that holds together across reruns.

So this piece is about composition. Not the mechanics of any one piece, but how skills, subagents, MCP, and worktrees fit into a claude code workflow you can run twice and trust both times. If you want the deep version of any single component, how subagents work and SKILL.md vs MCP go further than I will here.

Four components, four jobs

Claude Code gives you four building blocks, and they are not interchangeable. Knowing which does what is most of the battle.

Skills are Markdown procedures. A SKILL.md file holds a checklist or a multi-step recipe, and per the Claude docs, "a skill's body loads only when it's used, so long reference material costs almost nothing until you need it." Skills run in your main context by default. You invoke one with /skill-name, or you let Claude pick it up when the description matches.

Subagents are workers with their own context window. The subagents doc is blunt about the value: each one "runs in its own context window with a custom system prompt, specific tool access, and independent permissions," and "the verbose output stays in the subagent's context while only the relevant summary returns to your main conversation." I think that last clause is the whole reason they exist.

MCP is the interface layer. An MCP server is an external process that exposes tools and data over JSON-RPC: your database, your issue tracker, a browser, an internal API Claude has never heard of. It is how the agent reaches things that live outside the repo.

Worktrees are isolation for the filesystem. A subagent set to isolation: worktree gets "an isolated copy of the repository" in a separate git worktree, so two workers editing files never overwrite each other.

So the division of labor is clean enough to memorize. Skills tell the model what to do. Subagents decide where the work happens and what comes back. MCP decides what the work can touch, and worktrees keep parallel writers off each other's toes. Orchestration is mostly just pointing each one at the right job.

The composition patterns behind claude code workflows

Four patterns cover almost everything I build.

A serial skill chain is the simplest of them. You write each stage as a skill, then run them in order so the output of one feeds the next: lint, then fix, then test, then summarize. The docs describe this directly. "Ask Claude to use subagents in sequence. Each subagent completes its task and returns results to Claude, which then passes relevant context to the next." Because the handoff is a summary and not a transcript, your main thread stays readable even across a five-stage run.

Fan-out is where subagents earn the cost. For independent investigations you "spawn multiple subagents to work simultaneously," each exploring its area, after which Claude synthesizes the findings. What makes this safe is structural: subagents cannot spawn other subagents, so your fan-out stays one level deep and never recurses into a fork bomb of agents. The catch the docs flag is real, though. "Running many subagents that each return detailed results can consume significant context." You want to fan out wide but have each worker return something narrow.

MCP as the interface layer is a pattern in its own right, not just a feature you happen to use. You can scope an MCP server to a single subagent through its frontmatter, and the docs spell out why that matters: "To keep an MCP server out of the main conversation entirely and avoid its tool descriptions consuming context there, define it inline here rather than in .mcp.json. The subagent gets the tools; the parent conversation does not." So your database-touching worker carries the Postgres MCP server, your main thread stays lean, and the tool surface lives exactly where it gets used.

Worktree isolation is the pattern for parallel implementation. When three subagents each write code, you give each one isolation: worktree so their edits land in separate copies of the repo. The branch comes off your default branch, and the worktree "is automatically cleaned up if the subagent makes no changes." Read-heavy fan-out does not need this. Write-heavy fan-out does, though, or you will spend your afternoon untangling clobbered files.

Context management is the actual hard part

The reason these patterns work is the same reason they are easy to get wrong. Every boundary you cross is a context decision.

A non-fork subagent starts cold. The docs are explicit that it "does not see your conversation history, the skills you've already invoked, or the files Claude has already read." Claude writes a delegation message that summarizes the task, and the worker proceeds from there plus its CLAUDE.md and git status. That isolation is the feature, but it is also the trap, because if a rule lives only in your head and never made it into the delegation prompt, the subagent never learns it. The docs give the example: a rule like "ignore the vendor/ directory" has to be restated in the prompt you hand to Claude when delegating.

Forks are the escape hatch for when isolation costs too much. A fork "inherits the entire conversation so far instead of starting fresh," which drops the input isolation but keeps the output isolation, so the fork's own tool calls stay out of your conversation and only its final result comes back. Use a fork when a named subagent would need a paragraph of background to be useful. Use a clean subagent when the task is genuinely self-contained.

Skills cross the boundary in their own way. Set context: fork in a skill's frontmatter and "the skill content becomes the prompt that drives the subagent." Pair it with agent: Explore and the forked skill sees only the SKILL.md content and the agent's own system prompt, because Explore skips CLAUDE.md to stay cheap. That combination, a skill as the task and a read-only agent as the runtime, is the cleanest way I know to run heavy research without flooding your main window.

This is where durable skills stop being a nice-to-have. A workflow is only as stable as its stages, and a stage written as a throwaway prompt drifts every time you rerun it, while a stage written as a versioned SKILL.md does not. If you want reusable skills that behave the same on Tuesday as they did on Monday but you would rather not hand-write the YAML and the procedure yourself, Knack turns a short interview into a shippable Anthropic-format skill. The building blocks of a stable pipeline are durable, reusable skills, and that is exactly what it produces.

Error handling across boundaries

Subagent failure modes are quiet, so you have to design for them up front. Background subagents "run with the permissions already granted in the session and auto-deny any tool call that would otherwise prompt," which means a worker can stall on a permission it cannot ask for. The documented recovery is to "start a new foreground subagent with the same task to retry with interactive prompts." In practice, pre-approving the operations your workers need before you fan out removes most of these stalls.

For high-stakes work, the orchestration layer can build verification in. Anthropic's Opus 4.8 announcement describes Claude planning the work and then running "hundreds of parallel subagents in a single session" with outputs verified before they reach you. This is the new Dynamic Workflows research preview, where, per InfoQ's coverage, Claude "writes a JavaScript orchestration script on the fly" and a separate runtime executes it in the background. Reporting on the preview puts hard ceilings on it: up to 16 agents running concurrently and a thousand total per execution. Even if you never touch the preview, the verification pattern is the part worth stealing. When the cost of an error is high, run independent attempts, have them critique each other, and keep only the conclusions that survive the cross-check.

A worked example: a security audit pipeline

Say you want to audit a mid-size web app before a release. Here is how the pieces compose.

You start in your main session and write the audit criteria as a skill, /security-audit, so the same checklist runs every time. The skill's first move is a fan-out. Claude spawns three subagents in parallel, one per domain: auth and session handling, input validation and injection, dependency and secret exposure. Each gets a focused prompt and read-only tools, because at this stage nobody should be editing anything. The auth worker carries a database MCP server scoped to its frontmatter so it can inspect the actual schema, while your main thread never loads those tool descriptions.

Each subagent burns through hundreds of files in its own context and returns a short findings list. Three verbose investigations come back as three tight summaries, and your main window stays readable. Claude synthesizes the three lists, dedupes the overlap, and ranks by severity. For the high-severity items you do not trust a single pass, so you run the verification pattern: a second set of workers re-checks the top findings independently, and you keep only what survives the cross-check.

Then the work turns to writing. Fixes go to subagents with isolation: worktree, each in its own copy of the repo, so the auth patch and the validation patch never touch the same working tree. They run the existing test suite in their worktrees, and only the diffs that pass come back. You review the merged result.

The whole run is one pipeline. A skill defines it, fan-out parallelizes it, MCP gives the workers reach, verification catches the expensive mistakes, and worktrees keep the writers separate. When it spawns a dozen background workers, you do not babysit a dozen terminals. You watch the agent view dashboard, a single pane that, per the Claude blog, shows each session's status and which ones produced deployable changes.

The skill is the reusable artifact here. The fan-out, the MCP scoping, the worktrees, those are runtime decisions Claude makes from the skill's instructions, so the better your skills get, the less orchestration you have to hand-steer. Write the audit once as a durable SKILL.md, version it, hand it to a teammate, and the pipeline travels with the file. I think that is the real line between a clever one-off session and a workflow you can run every release.

So start with the patterns, not the preview. Fan-out for research, worktrees for parallel writes, MCP scoped to the worker that needs it, skills as the stages. Get those composing cleanly on a real task this week, and the thousand-agent stuff is just a bigger dial on a machine you already know how to drive.