knack
← all posts

How to Write a SKILL.md Description That Actually Triggers

The description is the only thing loaded at discovery time, so it is the whole trigger. How to phrase it, what to name, the length sweet spot, and how to test that it fires.

Your skill works. The script runs, the output is clean. Then you ask Claude the exact question it was built for and nothing happens. Claude solves the problem its own way, ignores the skill, and you start wondering whether the thing is even installed.

It's installed. The body is fine. What broke is almost always the skill description. So this is a guide to writing a skill description that fires when it should, the part of authoring that decides whether any of your other work ever gets seen.

I'll start with the one fact that makes the rest make sense.

Why the description is the only thing that matters at discovery time

At startup, an agent does not read your SKILL.md. It reads the name and the description from the frontmatter of every installed skill, and nothing else. That metadata costs roughly 100 tokens per skill, and it's the entire basis on which the agent decides whether your skill is relevant to the current request. The Agent Skills specification calls this progressive disclosure and lays it out in three levels: metadata at startup (name plus description), the full SKILL.md body once a skill is activated, and bundled files in scripts/, references/, or assets/ only when a task actually reaches for them.

That has a consequence worth sitting with. The body of your skill, the careful instructions, the validation loop, the worked examples, none of it is loaded when the agent decides whether to use the skill. The agentskills.io guide on optimizing descriptions puts it plainly: "the description carries the entire burden of triggering. If the description doesn't convey when the skill is useful, the agent won't know to reach for it."

So when people ask how to write a skill description, the honest answer is that you're not writing a summary. You're writing the matcher. The agent compares the user's message against your description text and makes a relevance call, and every skill description best practice worth following comes from treating that field as a trigger rather than a label.

Trigger phrasing: write what the user says, not what the skill is

The single most common failure is a description written from the author's point of view. "A tool for advanced spreadsheet manipulation and statistical rollups." It's accurate, and it's useless for matching, because no user types "I'd like advanced spreadsheet manipulation." They type "can you add a profit margin column to this xlsx and flag anything under ten percent."

Anthropic's skill authoring best practices give two rules that fix most of this. Write in the third person, because the description gets injected into the system prompt and a first-person "I can help you..." muddies the point of view the agent reasons from. And name both what the skill does and when to use it. Their PDF example reads: "Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction."

The shape is the lesson here. A clause about capability, then a "Use when" clause that lists the situations and the words a user would actually say. The agentskills.io guide pushes the "Use when..." framing harder and tells you to be a little pushy: list contexts explicitly, including ones where the user describes the need without naming your domain, "even if they don't explicitly mention 'CSV' or 'analysis.'"

That one phrase is most of the craft. A user asking "my manager wants a chart from this data file" never says CSV, and a good description anticipates the request behind the request.

Name the verbs, the file types, and the artifacts

Vague descriptions fail because semantic matching has nothing concrete to grab. "Helps with documents" matches everything, so it distinguishes nothing. Give the matcher real nouns and verbs.

List the file extensions your skill handles: .xlsx, .docx, .pdf, .csv. List the artifacts it produces or consumes: pivot tables, commit messages, migration files, release notes. List the verbs that describe the action: extract, merge, lint, redline, backfill. Anthropic's Excel example does all three at once: "Analyze Excel spreadsheets, create pivot tables, generate charts. Use when analyzing Excel files, spreadsheets, tabular data, or .xlsx files." That's four trigger surfaces in one sentence, each a different way a user might phrase the same need.

Naming matters here too. The name field rides along with the description at discovery time. Anthropic suggests gerund forms like processing-pdfs or analyzing-spreadsheets, and warns off helper, utils, and tools. A name like utils carries zero trigger signal. analyzing-spreadsheets carries some on its own, before the description even gets read.

For non-coders, this is the hardest single part of building a skill. Describing what you want in plain English is easy. Turning that into a tight, trigger-tuned description field, with third person, the right verbs, the file types a user would mention, and the implicit phrasings, is a craft most people have never practiced. Knack handles exactly this. You answer interview questions about the workflow in your own words, and Knack writes the SKILL.md description from your answers, tuned for triggering instead of for sounding nice. You never touch YAML.

The length sweet spot

There's a hard ceiling and a soft target, and they're different numbers worth keeping straight.

The hard ceiling from the spec is 1024 characters for the description field. It's non-negotiable; the file is invalid above it. Claude Code adds a wrinkle on top: in the skill listing, the combined description and optional when_to_use text is truncated at 1,536 characters, and that listing budget is shared across every skill you have installed. Put your most important use case first, because the tail is what gets cut when the budget tightens.

The soft target sits well under the ceiling. The agentskills.io guidance: "A few sentences to a short paragraph is usually right, long enough to cover the skill's scope, short enough that it doesn't bloat the agent's context across many skills." A two-line description that names the right triggers beats a 1000-character one that buries them. And descriptions creep longer every time you tweak them, so check the count after each edit. If you run many skills, the budget pressure compounds. The interaction between description length and how many skills can be listed at once is its own topic, covered in the skill listing budget.

Anti-patterns: too vague and too broad

There are two failure modes, in opposite directions.

Too vague is the "helps with documents" problem. The description doesn't carry enough specific signal, so the agent can't tell when the skill applies, and it sits there unused. The fix is concreteness: the verbs, file types, and artifacts from a couple of sections up.

Too broad is sneakier, because the skill does fire, just at the wrong times. A description that claims half the problem space will trigger on near-misses. The agentskills.io guide makes the cure explicit through its negative test cases. A CSV analysis skill should not trigger on "write a python script that reads a csv and uploads each row to our postgres database," because that request shares the word "csv" but the actual task is database ETL, not analysis. If your skill keeps firing on requests like that, the description is too broad. Add specificity about what the skill does not do, or clarify the boundary against the adjacent capability the request actually needs.

One more reality check from the same guide. Agents tend to consult skills only for tasks that need capability beyond what they can already do. A bare "read this PDF" might not trigger a PDF skill even with a perfect description, because the agent can just read the PDF. The descriptions that earn their keep cover specialized formats, unfamiliar APIs, and domain workflows the agent wouldn't otherwise know to handle a particular way.

How to actually test that it triggers

You don't have to guess. You can measure trigger rate, and the agentskills.io guide gives a method worth following.

Build a set of about 20 eval queries, each labeled should_trigger true or false. Make the positives realistic: file paths like ~/Downloads/report_final_v2.xlsx, personal context like "my manager asked me to," real column names, casual phrasing, the occasional typo. The positives that matter most are the ones where the skill would help but the connection isn't obvious from the words alone, since those are exactly where description wording decides the outcome. For the negatives, reach for near-misses, queries that share keywords but need something different, because those test precision instead of mere coverage.

Then run each query a few times, because model behavior is nondeterministic and the same prompt can trigger on one run and miss on the next. Three runs is a reasonable start. Compute the fraction of runs where the skill fired, and call a should-trigger query passing if that rate clears about 0.5. In Claude Code you can detect a fire by checking the JSON output for a Skill tool call matching your skill name. If you want to avoid tuning a description that works only on the exact phrasings you tested, split the queries into a train set you tune against and a validation set you hold back, then pick the description with the best validation pass rate rather than the last one you wrote.

When a real skill stays quiet, Claude Code's troubleshooting steps are the fast first pass: confirm the description includes words a user would naturally say, ask "What skills are available?" to verify it's even listed, and try rephrasing closer to the description. If a description gets cut off in the listing, /doctor will tell you whether the budget is overflowing and which skills lost their text.

A description that triggers reliably is the precondition for everything else your skill does. Write it as the matcher it is, name the things a user would name, keep it tight, and test it against queries you wrote before you started tuning. And if you'd rather not hand-tune YAML, Knack writes a trigger-tuned description straight from your interview at getknack.ai, with the whole flow shown in building a skill with no code.