How do Claude Skills load? In three stages, and you only pay for the first one upfront. At startup Claude reads a tiny slice of every installed skill, the name and description, and nothing else. The body stays on disk until your request matches that description. The bundled files stay on disk until the body points at them. Anthropic calls this progressive disclosure, and it is why you can install fifty skills without watching your context window drain away.
That is the short answer. The mechanics are worth a few minutes, though, because they explain a failure mode most people hit and never manage to diagnose.
How do Claude skills load? The three levels, and what each costs
Anthropic's Agent Skills overview lays out three types of skill content, each loaded at a different time (platform.claude.com).
Level 1 is the metadata: the name and description from the YAML frontmatter at the top of SKILL.md. Claude loads this at startup and drops it into the system prompt. The docs put the cost at roughly 100 tokens per skill. That buys Claude one thing, knowing a skill exists and when it might be relevant. It is cheap enough that the doc says plainly you can install many skills "without context penalty."
Level 2 is the body of SKILL.md, the actual procedure. The workflow, the gotchas, the step-by-step. This is where the real instructions live, and it loads only when your request matches the description. The docs peg it at under 5k tokens.
Level 3 is everything else the skill bundles: reference docs, schemas, templates, helper scripts. The token cost here is listed as effectively unlimited, which sounds like a contradiction until you read the mechanism. These files get read or executed via bash only when the level-2 body references them, and a script's code never enters context at all. Claude runs the script and gets back the output. A skill can ship a 400-page API reference and a dozen scripts and cost zero tokens for any of it, right up until a task reaches for one specific file.
So the bill tracks what you use. Installing more skills barely moves it.
When does a Claude skill trigger?
The trigger is the description matching your intent. That is the entire activation mechanism, and it pays to be precise about how it works.
At startup, Claude holds the name and description of every skill in its system prompt. When you send a request, it compares what you asked for against those descriptions. If one matches, Claude reads that skill's SKILL.md from the filesystem via bash, and only then does the body enter the context window. The engineering post behind the feature says the metadata exists so Claude can "know when each skill should be used without loading all of it into context," and that "Claude will use these when deciding whether to trigger the skill in response to its current task" (anthropic.com).
Sit with that second quote for a second. The description does the routing. It decides which skill fires, which makes it the most operationally important line in the whole file.
The Claude Code docs walk through a concrete sequence with a PDF skill (code.claude.com). At startup the system prompt carries the PDF skill's name and one-line description. You ask Claude to extract and summarize a PDF. Claude reads pdf-skill/SKILL.md via bash and pulls the instructions into context. It decides form-filling is not needed, so FORMS.md is never read, and it finishes the task. Three of the skill's files existed the whole time. One got loaded.
Why this design keeps context cheap
Context is finite, and every token you spend on a skill you are not currently using gets stolen from the actual work. The naive way to give a model a skill is to paste the whole procedure into the system prompt. Do that across thirty skills and you have spent your budget before the user even types anything.
Progressive disclosure flips that around. The expensive content, the bodies and the bundled files, sits dormant on the filesystem and costs nothing. Only the names and descriptions ride along permanently, and those are tiny by design. So unbounded skill complexity turns into bounded context consumption, because the model loads what the current task actually calls for and leaves everything else alone.
This is also why skills beat stuffing the same procedure into CLAUDE.md. The Claude Code docs are blunt about it: a skill's body "loads only when it's used, so long reference material costs almost nothing until you need it." A wall of instructions in CLAUDE.md loads on every single turn, whether the task needs it or not.
The practical lesson behind the loading model
A vague description means the skill never loads.
That is the failure mode, and it is a quiet one. You write a brilliant skill, the body is perfect, the bundled scripts are tested, and the thing sits there doing nothing. At decision time the only thing Claude ever sees is that one description line, and yours says something like "helps with documents." Helps with documents when? Claude cannot match an intent against a fog. The body could cure cancer and it would not matter, because level 2 never fires when level 1 does not match.
So the description carries more weight than any other line in the file. It has to state what the skill does and the concrete triggers for using it, meaning the file types, the verbs, the words a user actually says. "Use when working with PDF files or when the user mentions PDFs, forms, or document extraction" is the kind of phrasing the official examples use, and it is specific on purpose. Writing one that fires reliably is a craft of its own, which I cover in how to write a SKILL.md description.
There is a second payoff people forget to connect. This same loading model is why skills are nearly free to keep around, which we break down in are Claude Skills free.
Getting the description right by hand takes a few tries and a feel for how Claude reads intent. Knack runs a short interview and writes the trigger-shaped description for you, then ships the whole SKILL.md folder, so the skill loads when it should instead of sitting dead on disk.
Write the description like it is the only line Claude will read before deciding, because for a moment, it is.