knack
← all posts

Claude Skill Best Practices: The Rules That Decide Whether Your Skill Fires

The authoring rules that separate a skill that fires and works from one that does not: description, scope, structure, scripts, and testing. A hub linking each deep dive.

A skill that never loads is worse than no skill, because you spent an afternoon on it and now you trust it to be there. The gap between a skill that activates on the right task and one that sits dead in the folder comes down to a handful of authoring decisions. Anthropic documents almost all of them. People ignore almost all of them, then paste a wall of instructions into a SKILL.md and call it done. The claude skill best practices below cover most of how to write a good one. Write a description that says when to fire. Keep the main file small and push the rest into referenced files. Name it well, scope what it can touch, hand deterministic work to scripts, and test it. Each section links a deeper post for the full mechanics.

The description is the only thing Claude reads at startup

This is the decision that governs all the others, and it is the one people half-write.

At startup, Claude does not load your skill body. It loads only the name and description from the frontmatter of every installed skill, then uses that metadata to decide which skill fits the task. A vague description means no fire, and then nothing else you wrote matters. Anthropic's skill authoring guidance is blunt about it: the description has to say both what the skill does and when to use it, in third person, because it gets injected into the system prompt and a mixed point of view hurts discovery.

Compare two of them. "Helps with documents" is dead on arrival. "Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction" fires reliably, because it names the triggers a real request would contain. The field caps at 1024 characters, so you have room to do this. That "Use when" clause is the part most people drop, and it is the part that does the work.

For more on claude skill description best practices, see how to write a SKILL.md description.

Keep SKILL.md lean and let the rest load on demand

Once a skill fires, its body competes for context with everything else Claude is holding. Reference files cost nothing until Claude reads them, but the main file loads on activation, so it is the one that has to stay small.

Anthropic's number is concrete: keep the SKILL.md body under 500 lines, and split content into separate files as you approach it. Treat the main file like a table of contents. A quick-start example, then pointers. Form filling in FORMS.md, the API surface in reference.md, edge cases in examples.md, each read only when the task calls for it. That is progressive disclosure, and it is what lets a skill carry a hundred pages of reference without paying for it every turn.

One trap here. Keep references one level deep from SKILL.md. When the main file points to advanced.md which points to details.md, Claude tends to preview the nested file with head -100 rather than read it whole, and you get half the instructions silently.

The SKILL.md template gives you the base file, and the skill folder structure guide covers progressive disclosure in depth.

Name it like you will have forty of these

Naming feels cosmetic until you have a directory full of skills and Claude is choosing between them on metadata alone. The name field allows lowercase letters, numbers, and hyphens, caps at 64 characters, and cannot contain "claude" or "anthropic." Within those limits, Anthropic recommends the gerund form: processing-pdfs, analyzing-spreadsheets, testing-code. Verb-plus-ing reads as a capability at a glance.

Avoid helper, utils, tools, and data. A vague name drags down discovery the same way a vague description does.

Scope what the skill is allowed to touch

A deploy skill should not be something Claude decides to run because your code looks finished.

Two frontmatter fields in Claude Code give you the controls, documented on the Claude Code skills page. disable-model-invocation: true makes a skill manual-only, so it runs when you type /name and never on Claude's initiative. Use it for anything with side effects, like a commit or a deploy. The allowed-tools field pre-approves a specific tool set while the skill is active, so a commit skill can run Bash(git add *) without prompting each time, while your normal permission rules still govern everything else.

There is a trust decision buried in this. A project skill checked into .claude/skills/ can grant itself broad tool access once you trust the folder, so review what it pre-approves before you accept the workspace dialog. Scoping is real authoring work, and skipping it is how a skill ends up doing things you never sanctioned.

Hand deterministic work to scripts, not prose

If a step must run the same way every time, write a script and tell Claude to run it. Do not describe the step in English and hope.

Anthropic frames this as degrees of freedom. Open-ended work like a code review wants high freedom and the model's judgment. Fragile work like a database migration wants low freedom, which means one exact command, run it, do not improvise. Bundled scripts beat generated code on reliability. They cost no context tokens, because Claude executes them via bash rather than reading them in, and they stay consistent across runs. Be explicit about your intent, though. Say whether Claude should execute a script or read it as reference, because it will guess wrong if you leave that ambiguous.

The deeper patterns (plan-validate-execute, verifiable intermediate outputs, error handling that solves instead of punting) live in using scripts in Claude skills.

Test it before you ship, ideally against evals you wrote first

This separates people who think their skill works from people who know.

Anthropic recommends building evaluations before you write extensive documentation. Run Claude on representative tasks with no skill, document where it fails, then write at least three evaluation scenarios that target those gaps. Establish a baseline, write the minimum to pass, and iterate against the evals rather than your intuition. Test across the models you actually use, because what reads as over-explaining to Opus might be the scaffolding Haiku needs.

Their workflow has one trick I like. Use one Claude to author the skill and a fresh Claude to use it, then watch where the second one stumbles. A model that ignores a rule already in the skill is telling you the rule is not prominent enough, and maybe it needs "MUST filter" instead of "always filter." Watching the behavior beats reviewing the text.

The mechanics of writing and running these evals are in testing and evals for SKILL.md.

The claude skill best practices that actually matter

Most of these skill md best practices are mechanical. None of them is hard. They are just easy to skip, and skipping them is why so many skills quietly never fire.

That mechanical quality is why Knack can enforce most of it for you. It turns a short interview into a shippable Anthropic-format skill that runs in Claude Code, the Claude apps, Codex, Cursor, and Gemini CLI, applying the rules above by default: a description in the what-plus-when shape, a lean main file with detail in referenced files, a clean name. You tune from there.

Write the description first. If you cannot state when the skill should fire in one sentence, the skill is not ready, and no amount of body text will save it.