Here is the fastest way to make a skill flaky. Describe a deterministic operation in prose, then hope the model rewrites the same code the same way every run. It won't. Ask Claude to "validate the JSON against the schema" in plain English and you get a slightly different validator each time, carrying a slightly different bug each time. Claude skill scripts solve exactly this. You write the validator once, drop it in the skill folder, and the model runs your file instead of improvising a new one.
Most authors underuse this part of Agent Skills. A skill is a folder, and that folder can carry executable code alongside the markdown. When the model hits a step backed by a script, it runs the file through bash and reads only the output. The source never enters the context window. A 300-line Python file costs nothing in tokens until it runs, and even then you pay for "Validation passed," not 300 lines of Python.
Why Claude skill scripts beat prose for deterministic steps
Some work the model should reason about, and some work it should just execute. Anthropic's authoring best practices frame this as degrees of freedom. A code review is an open field with many valid paths, so you give prose. A database migration is a narrow bridge with cliffs on both sides, so you give one exact command and tell the model not to add flags.
Scripts build the narrow bridge. The docs list four wins for shipping a utility script even when Claude could write the equivalent: more reliable than generated code, fewer tokens, no generation time, and consistent across every use. That last win is the one I care about most. A claude skill with scripts runs the same bytes on Monday and on Friday, against staging data and production data. That is the difference between a transform you trust and one you re-check by hand.
The scripts/ folder and how SKILL.md points at it
The convention is a scripts/ subdirectory next to your SKILL.md. Here's the shape Anthropic's docs use:
pdf/
├── SKILL.md
├── reference.md
└── scripts/
├── analyze_form.py
├── fill_form.py
└── validate.py
Your SKILL.md references a script by its relative path and tells the model to run it. Wiring SKILL.md scripts this way is what makes the claude skill script path work, because the model reads the folder like a filesystem and expects forward slashes only. Backslashes break on Unix, so you write scripts/validate.py even on Windows.
A reference looks like this inside the markdown body:
## Validate the export
Run the validator before writing any output:
```bash
python scripts/validate.py export.json
```
If it prints errors, fix the file and run it again. Only proceed when it prints `OK`.
One detail trips people up in Claude Code. The working directory is wherever the user launched, not the skill folder, so a bare relative path can miss. Claude Code exposes ${CLAUDE_SKILL_DIR}, which resolves to the directory holding your SKILL.md no matter where the session started. Reach for it when you want the claude skill run script step to work whether the skill is installed personally, per project, or through a plugin:
python3 ${CLAUDE_SKILL_DIR}/scripts/visualize.py .
Be explicit about intent, too. "Run analyze_form.py to extract the fields" tells the model to execute. "See analyze_form.py for the extraction algorithm" tells it to read. For most utility code you want execution, and the docs flag mixing these up as a common authoring mistake.
A concrete example: a validator the skill calls
Say your skill produces a config file and every output has to satisfy a schema. Put the check in scripts/:
#!/usr/bin/env python3
import json, sys
REQUIRED = {"name", "region", "replicas"}
def main(path):
data = json.load(open(path))
missing = REQUIRED - data.keys()
if missing:
print(f"FAIL: missing keys {sorted(missing)}")
sys.exit(1)
if not isinstance(data["replicas"], int) or data["replicas"] < 1:
print("FAIL: replicas must be a positive integer")
sys.exit(1)
print("OK")
if __name__ == "__main__":
main(sys.argv[1])
Now the model isn't deciding what "valid" means on each run. It writes a draft, runs the validator, and reads FAIL: missing keys ['region']. Then it fixes the draft and runs the validator again. That loop is the single most useful pattern for a skill that has to be right, and Anthropic's guide calls it plan-validate-execute, recommended for batch operations and destructive changes. Make the error messages verbose so the model converges instead of guessing. A silent failure just pushes the broken output downstream, to where you find it later.
Write the script to handle its own errors rather than punting back to the model. A validator that crashes on a missing file teaches it nothing. One that prints FAIL: file not found, expected export.json keeps the loop moving.
You are shipping runnable code, so review it like it
A skill that bundles scripts is software you are installing.
Anthropic's security guidance is blunt. Use skills only from sources you trust, and if you must run one from an unknown source, audit every file first. The scripts get the same scrutiny as the SKILL.md, because they're the part that actually runs. A malicious skill can direct the model to execute code unrelated to its stated purpose, and depending on what the session can reach, that can mean data exfiltration or unauthorized access.
The surface is small enough to read. Look for unexpected network calls, file access that doesn't match what the skill claims to do, and any script pulling content from an external URL, since fetched content can carry instructions of its own.
Two Claude Code controls help here. A project skill's allowed-tools only takes effect after you accept the workspace trust dialog, so review the repo before trusting it. You can also scope exactly what a script may invoke, which is covered in restricting a skill's allowed tools. For a structured pass over a skill someone handed you, the SKILL.md security audit walks the whole folder.
When a script is overkill
Don't reach for code when the step is a real judgment call. If multiple approaches are valid and the right one depends on context, prose wins, because a rigid script fights the model rather than helps it. Summarizing a document, drafting a commit message, deciding how to structure a report: these are open fields. Force them through a script and you make the skill brittle while losing the thing the model is good at.
The test I use is simple. Would I want this step to produce byte-identical behavior every time? If yes, write the script. If the right output legitimately varies with the input in ways a human would call "it depends," leave it as instructions and give the model good examples instead.
Bundling scripts is the fiddly part of authoring a skill by hand: real code to maintain, path references to wire, execute-versus-read intent to keep straight. Knack scaffolds that from a short interview. It generates the SKILL.md, drops helper scripts into scripts/, and writes the references with the right relative paths so the model runs them instead of regenerating them. You still review the code, because it's still code you'll run.
The full folder layout, where the claude skill resources folder, references, and assets each sit alongside scripts/, lives in Claude skill folder structure. If you're starting from zero, the SKILL.md template gives you a base file to paste script references into. Write the validator once. Stop re-checking the model's homework.