Every 4th AI Skill You Install May Be Toxic

AI security SkillSpector NVIDIA Agent skill open source

发布于 2026-07-02 10:53:10 5 次浏览

Every 4th AI Skill You Install May Be Toxic

42,447 AI Agent skill samples, 26.1% with vulnerabilities, 5.2% suspected malicious.

This is not an alarmist clickbait title; it's the conclusion of an empirical study published by Liu et al. in 2026. Put plainly: for every 4 skills you install, 1 is exposed; for every 20 skills, 1 is stealing from you.

This week, NVIDIA open-sourced SkillSpector, which shot to GitHub Trending with 9,933 stars and 784 forks. It does one simple thing: scan before you install a skill.

Why skills are more dangerous than plugins

When a browser extension goes bad, it may crash one page. When an npm package goes bad, at least there are lock files, audit tools, and CI pipelines to catch it.

But Agent skills are different.

They are not ordinary code. They are behavioral manifests — directly telling the Agent: "you can work like this, call these tools, read these files, execute these scripts." If the manifest hides malicious instructions, the Agent won't treat them as an attack; it will execute them as task rules.

That's the insidious nature of skills: they exploit trust vulnerabilities, not code vulnerabilities.

You install a "PDF generation" skill. It can indeed generate PDFs. But it might also be doing:

Iterating os.environ and sending your API keys to an external server
Adding a cron persistence task in the install script
Leaking system prompts through prompt injection
Declaring read-only permissions while actually writing files

You won't notice. Because when the Agent executes these actions, it doesn't ask "do you allow?" — it treats them as normal behavior defined by the skill.

How SkillSpector scans

Not keyword matching, but a two-stage pipeline.

Stage 1: Static analysis. Regex rules + Python AST behavior analysis + dangerous call detection + YARA signature matching. No code execution; purely file content inspection.

Stage 2: Optional LLM semantic evaluation. Let the model judge more subtle risks — for instance, seemingly normal instructions that actually induce the Agent to perform unauthorized operations.

Covers 65 vulnerability patterns across 16 categories:

Category	Typical Risk
Prompt Injection	Hidden instructions, instruction overwriting
Data Exfiltration	Environment variable harvesting, external communication
Privilege Escalation	Excessive autonomous behavior, undeclared capabilities
Supply Chain Risk	Malicious dependencies, CVE packages
MCP Tool Poisoning	Permission claims don't match actual behavior
Memory Poisoning	Contaminating Agent long-term memory
System Prompt Leakage	Inducing Agent to output internal instructions

After scanning, it gives a risk score from 0-100, mapped to LOW/MEDIUM/HIGH/CRITICAL, and a recommendation: SAFE, CAUTION, DO_NOT_INSTALL.

Practical: 5 minutes to get started

Install with uv, no need to clone the repo:

uv tool install git+https://github.com/NVIDIA/SkillSpector.git

Scan a local skill directory:

skillspector scan ./my-skill/

Scan a single SKILL.md:

skillspector scan ./SKILL.md

Generate JSON report (for automation):

skillspector scan ./my-skill/ --format json --output report.json

Generate SARIF (for CI/CD or IDE):

skillspector scan ./my-skill/ --format sarif --output report.sarif

Don't want to send file contents to an LLM? Run only static analysis:

skillspector scan ./my-skill/ --no-llm

The --no-llm option isn't an optional extra. The README explicitly warns: LLM semantic analysis sends file contents to the provider you configure. If you're scanning internal company skills, the content may contain trade secrets — in that case, --no-llm is not a choice, it's a necessity.

What makes it better than manual review

Manual review of skills: the problem isn't that you can't understand it, it's that you can't see it all.

A skill may simultaneously contain Markdown instructions (hiding prompt injection), Python scripts (calling exec/subprocess), dependency declarations (referencing packages with CVEs), MCP configurations (declaring excessive permissions), and output processing logic (inducing exfiltration). Relying on human inspection easily misses the connections between contexts.

SkillSpector breaks these risks into rules and scores, covers all 65 patterns, delivers results in seconds, and can be integrated into CI as a gate.

But more critically, it acknowledges its own boundaries.

The README states clearly: SkillSpector does not execute the scanned skill; all analysis is static. It flags risks before you install, not isolate risks after installation.

This is more honest than tools that imply "use it and be safe."

Why NVIDIA built this

NVIDIA is not just open-sourcing a tool. It's building a skill governance system.

According to the NVIDIA Technical Blog, before a skill enters the NVIDIA Skills directory, it must first pass a SkillSpector scan. This turns security checks into a mandatory step in the release process — not a "we recommend you scan," but a "no scan, no publish."

At the same time, NVIDIA also introduced Verified Agent Skills and Skill Cards, adding certification labels to skills that pass audit. The intention of this combination is clear: define the security standards for Agent skills, and then become the standard-setter.

This mirrors the same logic as Docker Hub's image scanning, npm's audit, and PyPI's security checks — whoever controls the security gate controls the ecosystem's voice.

The Agent skill ecosystem has exploded from fewer than 50 new skills per day in early 2026 to over 500 per day. ClawHub, a major registry, lacks systematic review. Snyk's audit of 3,984 skills found 1,467 malicious payloads.

The market is growing wildly; NVIDIA is building the toll booth.

Who should use it

Individual developers: If you use Claude Code, Codex CLI, Gemini CLI and often install third-party skills — add skillspector scan to your installation routine. It takes 5 seconds.

Teams: Maintain an internal skill library? Run --format markdown to generate an audit report, which is more reliable than saying "I've looked at the code."

Platforms: Running a skill marketplace or registry? Plug the SARIF output into your CI, and use the recommendation as a gate — SAFE to allow, CAUTION for manual review, DO_NOT_INSTALL to block directly.

One more issue

SkillSpector scans the skill itself. But Agent security risks come from more than just skills.

MCP server permission boundaries, runtime sandbox isolation for Agents, interaction risks when multiple skills are combined — none of these can be solved by a pre-install scanner alone.

The 26.1% vulnerability rate shows that the skill ecosystem indeed needs security checks. But the security check is only the first door, not the whole building.

A complete picture of Agent security probably needs three layers: pre-install scanning (what SkillSpector does), runtime sandbox (limiting what the Agent can actually do), and post-action auditing (recording what the Agent did).

Right now, only the first layer exists. The latter two are still blank.

NVIDIA has started something. But the road is long.

GitHub: https://github.com/NVIDIA/SkillSpector

Data sources: Liu et al. 2026 empirical study (42,447 samples); GitHub API (9,933 Stars / 784 Forks, 2026-06-24); NVIDIA Technical Blog

Every 4th AI Skill You Install May Be Toxic

Why skills are more dangerous than plugins

How SkillSpector scans

Practical: 5 minutes to get started

What makes it better than manual review

Why NVIDIA built this

Who should use it

One more issue

评论