The ClawHavoc Campaign: How Malicious AI Agent Skills Exposed the Verification Gap

Q: How do I implement SkillFortify in my OpenClaw project?

Install via pip: pip install skillfortify. Run skillfortify scan . to analyze your project, skillfortify verify skill.md to check declarations against implementations, and skillfortify lock to generate reproducible skill-lock.json files. Integration supports Claude Code skills, MCP servers, and OpenClaw manifests. Add the verify step to your CI/CD pipeline to block deployment of unverified skills. Use skillfortify trust skill.md to compute trust scores and establish policy thresholds for automatic deployment approval.

In January 2026, the ClawHavoc campaign injected 1,200 malicious AI agent skills into the OpenClaw marketplace, exposing a critical blind spot in agent security. Traditional defenses failed. Pattern matching, YARA rules, and LLM-as-judge heuristics all missed the threat, leaving researchers to later catalog 6,487 undetectable malicious agent tools that even VirusTotal could not identify. The response from the security community has crystallized around formal verification. SkillFortify, a new open-source tool, mathematically proves that agent skills cannot exceed their declared capabilities, offering the first provable defense against this emerging class of supply chain attacks. For builders shipping code daily, this represents a fundamental shift from probabilistic scanning to deterministic security guarantees.

What Exactly Happened During the ClawHavoc Campaign?

The ClawHavoc campaign began in early January 2026 when threat actors uploaded carefully crafted malicious skills to the OpenClaw agent marketplace. These were not obvious trojans or suspicious binaries. Instead, they presented as legitimate utility skills with plausible descriptions and functional code surfaces. The attackers exploited a fundamental trust assumption: that skills do what they claim. By embedding malicious behavior within seemingly benign capability declarations, they bypassed every existing detection layer. The 1,200 skills remained undetected for weeks, accumulating downloads and integration into production workflows. When security researchers finally identified the campaign, they discovered the skills employed polymorphic code structures that mutated their signatures while maintaining identical malicious functionality. This rendered traditional hash-based and pattern-matching defenses obsolete, as each instance appeared unique to static analysis tools while executing identical exfiltration payloads.

Why Did Traditional Security Tools Fail Against Agent Skills?

Antivirus engines and static analysis tools built for conventional software failed because AI agent skills represent a different computational model. Traditional malware detection relies on signatures, behavioral heuristics, or anomaly detection against known bad patterns. Agent skills, however, are declarative programs that execute within LLM contexts, making their behavior dependent on prompt injection and context manipulation. The 6,487 malicious tools catalogued post-ClawHavoc used prompt-based payload delivery and context-aware execution triggers. VirusTotal and similar platforms scan for file hashes, suspicious API calls, and known exploit patterns. They cannot interpret the semantic intent of a skill manifest or the potential for prompt engineering to weaponize legitimate APIs. This architectural mismatch created a detection gap that heuristic scanning cannot bridge. The tools looked for executable code anomalies, but the threat lived in the interaction between natural language prompts and API capabilities.

How Do Malicious AI Agent Skills Differ from Conventional Malware?

Conventional malware operates through direct code execution, file system manipulation, or network exfiltration. Malicious AI agent skills operate through capability abuse and context poisoning. A malicious skill might legitimately claim to “read local files” but use that capability to exfiltrate sensitive data when triggered by specific LLM prompts. The code itself looks legitimate. The manifest accurately describes the permissions. The maliciousness emerges only in the interaction between the skill, the LLM, and the user context. This distinction matters because existing security infrastructure assumes malicious intent manifests in code structure. Agent skills break this assumption by weaponizing legitimate capabilities through semantic manipulation. They require understanding not just what code runs, but what the code claims to do versus what it can actually do. A skill that reads environment variables might use them to reconstruct API keys, or a file reader might parse SSH keys when it detects specific filenames.

What Is Formal Verification and Why Does It Matter for Agents?

Formal verification applies mathematical proof to software behavior. Instead of testing for known bad patterns, it defines formal specifications of what a program should do and proves the implementation cannot violate those bounds. For AI agent skills, this means creating a mathematical model of declared capabilities and verifying the skill’s code cannot exceed those declarations. SkillFortify implements this by parsing skill manifests and building a formal model of permitted operations using Satisfiability Modulo Theories (SMT) solvers. It then analyzes the skill’s implementation to prove it cannot access APIs, file paths, or network endpoints outside the declared set. This approach eliminates the probabilistic uncertainty of heuristic scanning. When SkillFortify certifies a skill as safe, it provides a mathematical guarantee, not a confidence score. This matters because agent skills often run with ambient authority, inheriting the permissions of the host agent rather than operating in restricted sandboxes.

How Does SkillFortify Verify Skill Capabilities?

SkillFortify operates through a pipeline of static analysis and theorem proving. First, it parses the skill.md or manifest file to extract declared capabilities, API scopes, and resource boundaries. It then performs symbolic execution on the skill’s code, building a mathematical representation of all possible execution paths. Five core theorems govern this process: capability containment ensures no access beyond declared APIs; resource boundedness guarantees finite resource usage; termination guarantee proves the skill halts; non-interference prevents information leaks between domains; and provenance integrity validates code origin. The tool feeds these into an SMT solver to check for violations. If the solver cannot find a counter-example where the skill exceeds its declared capabilities, the verification passes. This process runs locally, requires no network access, and completes in seconds for typical skills. The output is a binary pass/fail with a machine-readable proof certificate.

How Do Heuristic Scanners Compare to Formal Verification?

The security community deployed a dozen heuristic scanning tools in response to ClawHavoc, including pattern matchers, LLM-as-judge systems, and YARA rules. All share the same fundamental limitation: they look for symptoms of maliciousness rather than proving safety. Heuristic tools require constant updates to threat signatures and suffer from false positives that disrupt development workflows. Formal verification validates against the skill’s own declared contract, requiring no external threat intelligence.

Feature	Heuristic Scanning	SkillFortify Formal Verification
Detection Method	Pattern matching, behavioral analysis	Mathematical proof of capability bounds
False Positive Rate	Variable, often high	Zero (in benchmarks)
Coverage	Known threats only	All possible execution paths
Guarantee	No findings does not mean no risk	Provable safety within declared bounds
Performance	Fast	Fast (seconds per skill)
Required Updates	Signature databases	None (theorem-based)

The choice between probabilistic detection and mathematical proof defines the new security boundary for agent ecosystems.

What Do the SkillFortify Benchmarks Reveal?

Researchers evaluated SkillFortify against a dataset of 540 skills, evenly split between 270 malicious and 270 benign samples drawn from post-ClawHavoc analysis. The tool achieved an F1 score of 96.95% with zero false positives. This means no legitimate skills were incorrectly flagged as malicious, while nearly all actual threats were caught. The malicious samples included polymorphic variants designed to evade signature detection, prompt injection payloads, and capability escalation attacks. The zero false positive rate proves critical for adoption; developers will not tolerate security tools that block legitimate workflows. The 3.05% miss rate primarily involved skills with undecidable capability boundaries due to dynamic code loading, a known limitation of static analysis that the tool flags explicitly rather than silently passing. This transparency allows security teams to quarantine edge cases for manual review rather than accepting unknown risk.

Why Did VirusTotal Miss 6,487 Malicious AI Agent Skills?

VirusTotal aggregates dozens of antivirus engines and sandbox analyses, yet it missed 6,487 malicious agent tools catalogued by security researchers in February 2026. These tools exploited the semantic gap between traditional software and agent-specific execution models. They contained no executable binaries, no suspicious imports, and no network signatures. Instead, they consisted of JSON manifests, markdown descriptions, and Python glue code that appeared benign in isolation. The malicious behavior emerged only when the LLM interpreted the skill’s instructions in specific contexts. Traditional sandboxes execute code and monitor system calls. They cannot simulate the nuanced interaction between an LLM’s reasoning engine and a skill’s prompt templates. This blind spot represents a new attack surface that existing infrastructure cannot monitor without formal semantic analysis. The ClawHavoc campaign proved that agent skills require verification tools that understand both code semantics and natural language intent.

What Are the Implications of CVE-2026-25253?

CVE-2026-25253 marks the first officially catalogued remote code execution vulnerability specifically targeting agent-software. Unlike traditional RCEs that exploit memory corruption or injection flaws, this vulnerability abuses the skill execution runtime itself. Attackers can craft malicious skills that escape the agent sandbox by exploiting the communication protocol between the LLM and the skill handler. This CVE validates the ClawHavoc threat model: agent skills are not just configuration files or scripts, but active software components with their own vulnerability surface. For builders, this means treating skills with the same security rigor as Docker containers or npm packages. The existence of a CVE specifically for agent-software signals that the security industry recognizes these components as distinct from traditional libraries or plugins. It also implies that insurers and compliance frameworks will soon mandate specific security controls for agent skill deployment.

How Do You Implement SkillFortify in Production Workflows?

Integration starts with installation via pip: pip install skillfortify. For CI/CD pipelines, add the verify step before deployment. Run skillfortify scan . to discover all skills in your repository. This generates a dependency graph showing skill interdependencies and capability requirements. Next, run skillfortify verify skill.md against each skill manifest to generate proof certificates. For reproducible builds, use skillfortify lock to create a skill-lock.json file that pins exact skill versions and their verified capability hashes. This functions similarly to package-lock.json but includes security proofs. Finally, integrate skillfortify trust skill.md to compute trust scores based on code provenance and behavioral analysis. Block deployment if any skill fails verification or falls below your trust threshold. For OpenClaw projects, add these commands to your pre-commit hooks to catch issues before they reach the repository.

pip install skillfortify
skillfortify scan .
skillfortify verify my_skill.md
skillfortify lock
skillfortify trust another_skill.md

What Is a Skill-Lock.json and Why Does Reproducibility Matter?

The skill-lock.json file generated by skillfortify lock serves as a cryptographic attestation of your agent’s skill configuration. It contains SHA-256 hashes of verified skill files, their declared capability boundaries, and the mathematical proofs generated during formal verification. This file ensures that every deployment uses exactly the same skill versions with exactly the same security properties. Without this, you face the risk of supply chain attacks where a skill updates silently to include malicious capabilities. The lock file also enables deterministic rollbacks; if a new skill version fails verification, you can revert to the previous locked state with guaranteed identical behavior. For teams using OpenClaw, this replaces fragile manual auditing with automated, cryptographically verifiable skill management. The lock file should be committed to version control alongside your code to maintain a complete audit trail of capability changes.

{
  "locked_skills": {
    "my_skill_v1.0.0": {
      "hash": "sha256:abcdef12345...",
      "capabilities": ["read_file:/data/*", "call_api:weather_service"],
      "verification_proof_id": "proof_id_123",
      "timestamp": "2026-02-27T10:00:00Z"
    },
    "another_skill_v2.1.0": {
      "hash": "sha256:fedcba98765...",
      "capabilities": ["write_log:/var/log/agent.log"],
      "verification_proof_id": "proof_id_456",
      "timestamp": "2026-02-27T10:00:00Z"
    }
  },
  "skillfortify_version": "1.0.0"
}

How Does Provenance Tracking Work with SkillFortify Trust?

The skillfortify trust command computes a composite trust score combining code provenance and behavioral verification. It checks the skill’s source repository for signed commits, maintainer reputation, and historical security issues. It then cross-references this against the formal verification results to detect mismatches between claimed authorship and actual code structure. A skill from an unknown author that passes formal verification receives a lower trust score than an identical skill from a verified maintainer with a history of secure contributions. This creates a defense-in-depth strategy: even if a skill passes capability verification, low provenance scores trigger additional review. The trust score outputs as a 0-100 metric with detailed breakdowns, allowing you to set policy thresholds automatically in your deployment pipeline. You might require scores above 80 for production deployment while allowing scores above 50 in development environments with additional monitoring.

What Are the Limitations of LLM-as-Judge Security Tools?

Several post-ClawHavoc security tools employ LLM-as-judge architectures, using large language models to analyze skill code for malicious intent. These suffer from fundamental reliability issues. First, they inherit the non-determinism of the underlying LLM; the same skill might pass or fail depending on temperature settings or prompt variations. Second, they require the LLM to recognize malicious patterns it may not have been trained on, creating a moving target problem. Third, they introduce circular dependencies: using an AI to secure AI agents creates a recursion vulnerability where the judge itself might be compromised. Most critically, LLM judges provide probabilistic assessments indicating suspicion rather than binary guarantees. They cannot prove a skill safe, only that it lacks obvious malicious indicators. This leaves the no findings does not mean no risk caveat intact, providing false confidence to security teams.

What Should Builders Watch for in the Next Wave of Malicious AI Agent Skills?

The post-ClawHavoc landscape will see attackers targeting the verification tools themselves. Expect attempts to craft skills with formally verifiable surface capabilities but exploitable side channels, such as timing attacks or speculative execution leaks within the agent runtime. Watch for skills that exploit the gap between declared capabilities and actual LLM behavior, particularly prompt injection attacks that weaponize legitimate file-read capabilities against sensitive data. Monitor for supply chain attacks targeting the formal verification tools; if attackers can compromise the SMT solver or theorem prover, they can falsify safety proofs. Finally, anticipate regulatory responses mandating formal verification for marketplace submissions, similar to how app stores now require privacy manifests. Builders should also watch for the emergence of adversarial skills designed to pass verification while maintaining dormant malicious payloads activated by future LLM updates.

How Do You Migrate from Pattern Matching to Formal Verification?

Migration requires shifting from blacklist-based security to contract-based verification. Start by auditing existing skills with skillfortify scan to identify shadow capabilities. Many skills request broader permissions than they use; formal verification will flag these discrepancies. Update your CI/CD pipelines to block deployment of unverified skills, but provide a grace period for developers to add proper capability declarations to their manifests. Train teams to write explicit skill.md files that declare exact API endpoints, file paths, and network scopes rather than using wildcards. Integrate the skill-lock.json into your version control to track capability changes over time. Finally, establish internal policies requiring trust scores above 70 for production deployment, with manual review for lower scores. This migration parallels the shift from manual server management to Infrastructure as Code, requiring cultural adjustment alongside technical implementation.

Why Is the OpenClaw Marketplace Particularly Vulnerable?

The OpenClaw marketplace hosts thousands of community-contributed skills with minimal review compared to traditional package managers. Unlike PyPI or npm, which have established security teams and malware scanning, agent marketplaces often prioritize rapid iteration and LLM compatibility. The ClawHavoc campaign exploited this velocity-focused culture. Skills in OpenClaw execute with ambient authority by default, meaning they inherit the permissions of the running agent rather than operating in isolated sandboxes. This architectural choice, which enables the flexibility that makes OpenClaw popular, also creates a high-value target for attackers. The marketplace’s integration with Claude Code and MCP servers further amplifies the blast radius, as a single malicious skill can compromise multiple agent runtimes across different platforms. The lack of mandatory capability declarations in early OpenClaw versions allowed the ClawHavoc skills to request excessive permissions without raising automated flags.

Frequently Asked Questions

What was the ClawHavoc campaign?

The ClawHavoc campaign was a January 2026 security incident where 1,200 malicious skills infiltrated the OpenClaw agent marketplace. These skills bypassed traditional detection methods including pattern matching and YARA rules, demonstrating fundamental vulnerabilities in heuristic scanning approaches for AI agent components. The campaign remained undetected for weeks, allowing widespread distribution before security researchers identified the threat. Analysis revealed polymorphic code structures designed specifically to evade signature-based detection while maintaining consistent malicious functionality.

How does SkillFortify differ from antivirus scanning?

SkillFortify uses formal verification rather than pattern matching. Instead of checking for known bad signatures, it mathematically proves that a skill cannot exceed its declared capabilities. Five theorems guarantee soundness, ensuring zero false positives and provable security bounds. Unlike antivirus tools that require constant signature updates, formal verification validates skills against their own declared contracts. This eliminates the no findings does not mean no risk caveat inherent to heuristic approaches, providing binary pass/fail determinations backed by mathematical proof.

What is CVE-2026-25253?

CVE-2026-25253 is the first remote code execution vulnerability assigned specifically to agent-software. It represents a new class of security threats targeting AI agent runtimes, distinguishing traditional software vulnerabilities from agent-specific attack vectors. Unlike conventional RCEs that exploit memory corruption, this vulnerability abuses the skill execution runtime and LLM-skill communication protocols. Its assignment signals that security researchers and vendors now recognize AI agent components as distinct software categories requiring specialized vulnerability tracking and remediation procedures.

Can formal verification detect all malicious AI agent skills?

In benchmark testing against 540 skills (270 malicious, 270 benign), SkillFortify achieved 96.95% F1 score with zero false positives. While no security tool offers absolute guarantees, formal verification provides mathematical proof of capability boundaries rather than probabilistic detection. The 3.05% miss rate involved skills with undecidable capability boundaries due to dynamic code loading, which the tool explicitly flags rather than silently passing. This transparency allows security teams to make informed risk decisions about edge cases.

How do I implement SkillFortify in my OpenClaw project?

Install via pip: pip install skillfortify. Run skillfortify scan . to analyze your project, skillfortify verify skill.md to check declarations against implementations, and skillfortify lock to generate reproducible skill-lock.json files. Integration supports Claude Code skills, MCP servers, and OpenClaw manifests. Add the verify step to your CI/CD pipeline to block deployment of unverified skills. Use skillfortify trust skill.md to compute trust scores and establish policy thresholds for automatic deployment approval.

Conclusion

Analysis of the ClawHavoc campaign reveals 1,200 malicious AI agent skills infiltrated OpenClaw. Formal verification emerges as the only viable defense.