knack
← all posts

How to Use Subagents in Claude Code Without Falling for the Parallelism Myth

Subagents exist for context-window isolation, not parallelism. Reach for one when a task would otherwise blow up the parent context. Reach for a Skill when the work is repeatable.

People install a subagent thinking it will run in parallel with the parent and shave wall-clock time off a long task. Then they call it through the Agent tool, watch the main thread block until the subagent returns, and conclude the feature is broken. It isn't. The subagent is doing what it was built to do, which is context-window isolation. Concurrency is a separate feature that sometimes piggybacks on the same primitive. If you want subagents in Claude Code to actually pay off, start with that distinction.

A subagent in Claude Code is a separate Claude instance spawned from a parent session. It has its own context window, its own system prompt, its own tool allowlist, and (optionally) its own model. The parent calls it via the Agent tool, which is the renamed Task tool with the old name still aliased for backward compatibility. The parent passes a prompt string. The subagent runs to completion. It returns one final message. Everything it read, every search result, every 4000-line log file it grepped through, stays in the subagent's context and never touches the parent's.

That last sentence is the entire reason the feature exists.

Where they live

Subagents are Markdown files with YAML frontmatter, scanned recursively from two main locations:

  • .claude/agents/<name>.md for project-scoped agents (check these into git)
  • ~/.claude/agents/<name>.md for personal agents available across every project

The docs spec lists more scopes (plugin, managed, --agents CLI JSON) but those two cover 95% of normal use. A minimal definition:

---
name: code-reviewer
description: Reviews a PR diff and returns concise feedback. Use after the user asks for a review of a specific PR number.
tools: Bash, Read, Grep, Glob
model: sonnet
---

You are a senior code reviewer. Given a PR number, fetch the diff with
`gh pr diff <number>`, read any files needed for context, and return a
review under 200 words. Cover: correctness bugs, security issues,
obvious perf footguns. Skip nits. Skip praise. Return one block of
prose plus a verdict line: APPROVE / REQUEST_CHANGES / COMMENT.

Only name and description are required. tools is an allowlist; omit it and the subagent inherits everything the parent has. model accepts sonnet, opus, haiku, inherit, or a full ID like claude-opus-4-7. The rest of the frontmatter (depth limits, permission mode, skill preloading, worktree isolation) is optional and most people never touch it.

Three ways to invoke

The parent agent picks a subagent based on its description field, which is why that field should read like instructions to a router and not marketing copy. Three invocation paths exist:

  1. @-mention in chat. Type @code-reviewer review PR #443 and the parent dispatches.
  2. Explicit Agent tool call. The parent constructs an Agent tool call with agent_type: "code-reviewer" and a prompt. You see this in transcripts as a single tool block that blocks until the subagent returns.
  3. Implicit dispatch. The parent reads description, decides "this fits," and routes the next user message there without being asked. This is where most people get confused, because the dispatch can feel invisible until you check the transcript.

The three real use cases

Context-heavy exploration. You ask Claude to find every place a deprecated auth helper is called across a 300-file monorepo. Done in the main thread, the grep output, the 40 file reads, and the chain of dead ends all stay in your context. Done via a subagent, you get back "12 call sites, here they are, here's the migration pattern." The 80k tokens of intermediate noise die with the subagent.

Bounded independent work. A research agent that pulls three competitor docs pages and summarizes pricing. A security scanner that runs npm audit, reads the lockfile, and reports. A test-coverage reader that runs the suite and returns the gaps. Each finishes, returns its summary, and the parent moves on with a clean context.

Genuine parallelism, with a footnote. You can launch multiple Agent tool calls in one assistant turn and they do run concurrently. This is how Anthropic's own engineering writeup recommends fanning out research. The footnote: each parallel subagent burns its own context window, so token spend scales linearly with branch count. Fan out to three or four when the work genuinely partitions. Past that, the parent spends more turns reconciling the returns than the parallelism saved.

The anti-patterns

Using a subagent for a repeatable workflow that should be a Skill. A subagent is a runtime delegation. A Skill is reusable knowledge that loads on demand into the agent that needs it. If you find yourself writing the same 800-word system prompt into a subagent file every time you spin up a new project, that prompt belongs in a SKILL.md instead, which is the side of this triangle that Knack handles. Subagents call Skills; Skills don't replace subagents, and the reverse mistake costs you portability.

Using a subagent for a three-line shortcut. If your subagent's system prompt is "run npm test and tell me what failed," that is a slash command. Slash commands live in .claude/commands/ and execute in the parent's context with zero spawn overhead. No new model call. No round-trip.

Spawning subagents that spawn subagents that spawn subagents. The Agent tool allows recursive dispatch, and people discover this and immediately build a four-level tree that takes nine minutes to return a two-paragraph answer. Cap your depth at one level unless you have a reason not to. The frontmatter has a turn-limit field for exactly this.

The decision triangle

Three one-line tests, in order:

  • Will I run this verbatim again next week? If yes, write a Skill. Skills are the portable shape. Knack exists because the SKILL.md format outlives any one harness, and a Skill written today still works when Claude Code ships its next interface change.
  • Will this task generate output I won't reference again? If yes, subagent. The whole point is throwing away the intermediate context after extracting the signal.
  • Is this three lines of boilerplate I retype constantly? Slash command. Don't spawn a model to save four keystrokes.

For comparison, OpenAI's Codex shipped its own subagent system in March 2026 with a manager-worker model. The Codex framing is supervisory. You define tasks, dispatch them to cloud-sandboxed workers, and come back later. Claude Code's framing is local and synchronous. The parent waits, you watch the transcript, and the work happens in front of you. Same architectural primitive (isolated context, summarized return), different operator experience. Pick by which loop you want to live inside.

The shortest possible test for whether to reach for a subagent: if the work you are about to do would make you wish you had a fresh conversation halfway through, spawn one now.