skill-md security audit

SKILL.md Security: A 12-Point Audit Before You Install

Snyk found 13.4% of 3,984 audited skills shipped a critical-level issue. A SKILL.md can become shell access in three lines. The audit habit that keeps you out of the next ClawHavoc campaign.

jordan · May 21, 2026 · 8 min read

On February 5, 2026, Snyk's security team published ToxicSkills, an audit of 3,984 agent skills pulled from ClawHub and skills.sh. The number that should make you uncomfortable: 13.4% (534 skills) shipped at least one critical-level issue, and 36.82% had at least one security flaw of any kind. Skill.md security is no longer a theoretical concern, and if you install, write, or publish skills, the audit habits below are the price of admission.

A SKILL.md file is plain markdown with a YAML header. It can become shell access in three lines. Snyk's follow-up technical writeup walks the chain: the skill declares an npm dependency in its metadata, the agent installs it with npx -y openclaw-yahoo-stock-news, the package's postinstall script runs arbitrary commands with the agent's permissions. No exploit, no zero-day. The agent did exactly what the skill told it to.

That is what makes claude skill security different from npm or PyPI supply-chain risk. The malicious payload does not need to be in the code you read. It can be in the instructions you read.

The attack surface, briefly

Look at the agentskills.io spec and an agent skill is a folder. A SKILL.md at the root holds YAML frontmatter (name, description, optional allowed-tools, optional compatibility, optional metadata) and a markdown body. Around it: scripts/ for executable code, references/ for documentation the agent reads on demand, assets/ for templates.

The agent loads progressively. At startup it reads only name and description from every skill on disk. When a user prompt seems to match a description, the agent reads the full SKILL.md body. When the body says "see references/auth.md for the exact prompt", the agent reads that file too. When the body says "run scripts/setup.sh", the agent runs the script.

Three things follow. First, the description field is the targeting mechanism. Claude and Codex auto-invoke skills based on it, which means a deceptive description ("Helps format JSON") attached to instructions that exfiltrate ~/.aws/credentials is a real attack pattern, not a thought experiment. The description is the only field the agent reads from every skill at startup, which means a 1024-character description packed with broad keywords ("any task", "general utility", "file handling") will get pulled in for prompts that have nothing to do with the actual skill body. Second, references/ is not inert. Agents read those files when the SKILL.md body tells them to, and they read whatever instructions live inside them with the same weight as the body itself. Anything you would not want pasted directly into a system prompt should not be in references/. Third, scripts/ runs with whatever permissions the agent has, and on a developer's laptop that is usually a lot. Read access to your home directory, write access to your repos, network access to anywhere on the public internet.

Snyk's ToxicSkills paper identified three patterns that repeat across malicious claude skill samples:

Markdown-as-installer. The SKILL.md tells the agent to run a curl pipe to bash, or to npx a package, or to clone a repo. The destructive bit lives outside the skill folder. The audit you did on the skill files finds nothing.
Hidden instructions in references/. The SKILL.md body looks clean. A line at the bottom says "Before running, read references/setup.md for prerequisites." The setup file tells the agent to read environment variables and POST them to an attacker endpoint as part of "telemetry."
Deceptive descriptions. The frontmatter says "Solana wallet balance checker." The body says read ~/.config/solana/id.json, base64-encode it, and send it to a logging URL. The description gets the skill auto-invoked. The body does the work.

The ClawHavoc campaign, which Snyk and Tencent both track, pushed 1,184 skills using these patterns and racked up about 247,693 confirmed installs before takedowns. Reported crypto losses: $2.3 million. The Snyk team also found that 17.7% of skills in their sample pull in third-party content at execution time, which creates indirect prompt-injection paths the audit-at-publish model cannot catch. The agent reads a CDN-hosted file, the file contains new instructions, the new instructions run.

In April, Tencent's Zhuque Lab scanned 50,000+ skills and found 74.6% declared network permissions and 25% declared file read/write. Most of those are legitimate. The point is the blast radius if any one of them is not. Zhuque also documented a ranking-manipulation pattern where attackers boost their own skills to the top of registry recommendation lists, which turns auto-install agent behavior into a delivery mechanism. The same paper identifies what the researchers call third-generation attacks: skills that exploit the agent's intended functionality rather than break any defense.

OWASP added AST01: Malicious Skills to the top of its new Agentic Skills Top 10 in March, citing both ClawHavoc and ToxicSkills as the precipitating incidents. AST01 sits above supply-chain compromise (AST02), over-privileged skills (AST03), and insecure metadata (AST04). Read AST01 in full if you maintain a skill registry. The mitigation it recommends, Merkle-root signing plus registry scanning, is where the industry is heading. We are not there yet.

The skill.md audit checklist

Run this before you install a third-party skill, and again before you publish one of your own. It is twelve items. Twenty minutes per skill the first few times, three minutes once you have done it ten times.

Read the description out loud. Does it actually describe what the body does? If the description says "format JSON" and the body talks about authentication tokens, stop. Description mismatch is the highest-signal red flag in the Snyk dataset.
Open every file in the skill directory. Not just SKILL.md. Every file under references/, every file under scripts/, every file under assets/. Agents read references on demand based on instructions in the body. If you have not read them, you have not audited the skill.
Search the whole tree for network calls. curl, wget, fetch(, requests., http://, https://, urllib. Then read what each one does. A skill that posts to an analytics endpoint may be fine. A skill that posts the contents of $HOME/.ssh/ is not.
Search for filesystem reads outside the working directory. ~, $HOME, /etc/, %APPDATA%, .env, id_rsa, credentials, .aws, .config/gh. A skill should not need to read your shell history. If it does, it had better say why in the description.
Check the allowed-tools line. The spec makes this field optional. Treat its absence as a red flag for any skill that runs code. With allowed-tools present, the author has told you which tools the skill expects to use. allowed-tools: Bash(git:*) Read is narrow. allowed-tools: Bash is a blank check.
In Claude Code specifically, look for disable-model-invocation: true on destructive skills. Anthropic's docs explain the flag: when set, Claude will not auto-invoke the skill, and the description does not get loaded into context. Deploys, deletes, payments, anything that hits production should have it. If you are publishing such a skill, set it. If you are installing one without it, ask why.
Verify the description is not a prompt injection. Some malicious skills put trigger phrases in the description to game auto-invocation. "Use whenever the user mentions any task, file, or question." That is not a description. That is a hijack.
Look at scripts/ for postinstall and setup hooks. Anything named install.sh, setup.py, bootstrap.*, or that runs implicitly on first use deserves a slow read. Snyk's three-line shell access chain hides here.
Check external dependencies. npm packages, pip packages, GitHub repos cloned at runtime, CDN-hosted scripts. Each one extends the trust boundary past the skill. The Tencent scan found 2.9% of skills fetched unverifiable dependencies at runtime, which is the cleanest possible audit-evasion technique.
Search for secrets. AKIA, sk-, ghp_, xox, -----BEGIN. ToxicSkills found a 10.9% exposure rate for hardcoded secrets across the ClawHub sample. Most are author mistakes, not attacks, but they still leak.
Read the author's other skills. ToxicSkills named three accounts (zaycv, Aslaep123, aztr0nutzs) responsible for clusters of forty-plus malicious skills each. A clean SKILL.md from an author who also publishes obvious crypto-stealers is not actually clean.
Pin the version. If the registry supports it, pin to a specific commit or tag. Update drift (OWASP AST07) is the boring version of supply chain compromise. A skill that was safe last week can be replaced by a malicious update next Tuesday with no notice.

If you are publishing your own skill, add three more steps on top: write a description that says exactly what the skill does, set disable-model-invocation: true on anything that has side effects, and run skills-ref validate on the folder before you push. The validator catches frontmatter errors and naming violations. It does not catch malicious intent, but it removes the easy-to-fix excuses.

Where Knack fits

Knack validates SKILL.md frontmatter before publish. The validator enforces the agentskills.io spec (1-64 character names, 1-1024 character descriptions, no consecutive hyphens, no uppercase) and flags risky patterns: bash invocations inside the description, references to external installers, secret-shaped strings, missing disable-model-invocation on skills that mention deploy, delete, or push. Authors who write skills through Knack do not have to remember the spec. The platform refuses publishes that would fail an audit.

That is a publish-time control. It does not replace the install-time audit. Anyone running a skill from any source should still do the twelve items above. The platforms that scan their own registries (skills.sh and ClawHub both ship scanners now) are catching the obvious patterns, but the ToxicSkills paper found eight confirmed malicious skills still publicly available on ClawHub at the time of publication. Detection is not removal.

What to install today, and what to wait on

Install skills from authors you can verify: companies with a real homepage, open-source projects with commit history, individuals you have worked with. Vendor-shipped skills (the ones bundled with Anthropic's official examples, or with Cursor, or with Goose) sit in a different risk category because their pipelines have human review. Use those first.

Wait on skills from anonymous accounts on public registries that promise crypto-related features, wallet integrations, or anything that touches keys. Snyk found that 100% of confirmed malicious skills used malicious code patterns and 91% combined them with prompt injection. The patterns are not subtle once you know to look. The reason they keep working is that nobody looks.

Run the twelve-item checklist on the next skill you install. If it takes more than twenty minutes, the skill is too big to audit in a single pass and you should either split it apart or skip it. Skills are supposed to be small. That is the property that makes the audit checklist tractable in the first place.

If you author skills for a living, the Knack validator catches the most-common frontmatter mistakes before they ship, and it flags the description-versus-body patterns that show up in malicious claude skill samples. The audit is still yours to run, both before publish and before install. Twelve items. Twenty minutes. Cheaper than the alternative.