Claude Adversarial Verification Skill

A Claude Code skill that performs rigorous adversarial verification across code, architecture, data, documentation, tests, and analysis using Chain-of-Verification (CoV) enhanced with abstractive red-teaming, hidden behavior probing, stress techniques, and tri-modal reasoning.

What it does

When invoked, this skill launches a skeptical verifier agent that follows a structured protocol:

Pre-verification (Steps 0–0b):

Identify what needs verification — code, architecture, data, documentation, tests, or analysis
Gather artifacts — the actual outputs to verify
Establish ground truth — what to verify against

Chain-of-Verification (Steps 1–2b):

Decompose artifacts into individual verifiable claims
Classify reasoning mode — deductive, inductive, or abductive per claim
Generate adversarial questions for each claim ("what would make this fail?")
Abstract to failure categories — find patterns, not just individual bugs

Deep Verification (Steps 3–3d):

Independently verify each claim by tracing actual paths
Probe for hidden behaviors — detect what the code doesn't advertise
Apply adversarial scaffold — suspicion modeling, attack selection, subtlety detection
Stress test — Existence Question, Scale Shift, Time Travel, Requirement Inversion

Reporting (Steps 4–5):

Report findings with reasoning-aware confidence scoring and anti-fabrication discipline
Survived verdicts — stress tests that hold are as valuable as those that break
Hypotheses — abductive findings reported separately, with alternatives and tests
Propose project doc updates — TODO.md, SPEC.md, PLAN.md (with user confirmation)

Verification Domains

Domain	What it verifies	Ground truth
Code	Source changes, logic, behavior	Tests, type system, spec
Architecture	Design decisions, spec coverage	Requirements, constraints, patterns
Data	Schemas, migrations, contracts	Production schema, validation rules
Documentation	Technical, process, and user-facing docs	Actual codebase, current API, git history
Tests	Test suite integrity and honesty	Production code, requirements, coverage reports
Analysis	Agent outputs, reports, docs	Source material, cited references

Techniques

Abstractive Red-Teaming

Instead of finding individual bugs, identifies failure categories — general patterns that produce bugs repeatedly. Searches the entire codebase for instances of the same pattern (frequency assumptions, implicit ordering, stale state, missing completeness, silent fallthrough, assumed environment).

Hidden Behavior Probing

Detects behaviors the code doesn't advertise using four probing strategies: indirect probing (trace actual execution), scaffolded probing (chain findings), cross-reference probing (claims vs reality), and absence probing (what's NOT there).

Modular Adversarial Scaffold

Decomposes the adversarial process into five modules: suspicion modeling (what would a reviewer miss?), attack selection (highest-risk claims first), plan synthesis (multi-step trace chains), execution (actually read the code), and subtlety detection (code that hides complexity).

Stress Techniques

Inspired by Principles of Chaos Engineering, adapted for review. Four techniques with forced variety (minimum 3 per run, never repeat): Existence Question (should this exist at all?), Scale Shift (what at 10x? at zero?), Time Travel (what in 6 months?), Requirement Inversion (what if the opposite?). Produces Survived: yes/no verdicts — knowing what's robust is as valuable as knowing what's fragile.

Tri-Modal Reasoning

Each claim is classified by reasoning mode — deductive (verify against ground truth), inductive (generalize from 3+ instances), or abductive (generate best explanation from observations). Abductive findings are reported as hypotheses with alternative explanations and proposed tests, never as verified facts.

Anti-Fabrication Discipline

Before claiming something doesn't exist, you must state where you looked. Confidence scoring is tied to reasoning mode: deductive (80-100, source cited), inductive (60-79, 3+ instances), abductive (40-59, hypothesis with alternatives). Hard constraint: no score above 79 without citing specific file/line/doc.

Agent Meta-Verification

When reviewing output from another AI agent, checks for: sycophantic deference, hidden agenda, anchoring bias, confabulated confidence, premature convergence, and evidence cherry-picking.

Install

Copy the skill directory:

cp -r skills/adversarial-verify ~/.claude/skills/

Or clone and copy:

git clone https://github.com/fullo/claude-adversarial-skill.git
cp -r claude-adversarial-skill/skills/adversarial-verify ~/.claude/skills/

Or install from the marketplace (recommended):

# Add the marketplace (once)
claude plugin marketplace add fullo/claude-plugins-marketplace

# Install the plugin
claude plugin install adversarial-verify@fullo-plugins

Update

claude plugin update adversarial-verify@fullo-plugins

The plugin system uses git commit hashes as versions. There is no automatic update notification: run the command above periodically to stay current.

Usage

In Claude Code, type:

/adversarial-verify

Or ask naturally:

"run an adversarial review on my recent changes"
"CoV check the last commit"
"verify this code with total skepticism"
"verify the PLAN.md against the SPEC.md"
"adversarial check on this migration"
"verify this agent's analysis report"
"look for systemic failure patterns in the codebase"
"probe this function for hidden behaviors"
"verify the README matches the actual install process"
"verify the tests actually test what they claim"
"stress test the auth module"
"what happens at 10x scale?"
"check if the planning agent's output is biased"

What it catches

Code

Silent data corruption — values that look correct but aren't
Logic flaws — code that passes simple tests but fails edge cases
Initialization order bugs — field A used before field B is set
Concurrent modification — adding to a list while iterating it
State leaks — data persisting across frames/calls when it shouldn't
Boundary conditions — off-by-one, coordinate system errors
Resource exhaustion — unbounded lists, missing cleanup

Architecture

Spec drift — implementation diverges from SPEC.md
Missing constraints — PLAN.md doesn't address known edge cases
Over-engineering — abstraction without justification
Dependency risk — new deps without evaluation
Breaking changes — API contract violations

Data

Schema inconsistency — migration doesn't match model
Data loss risk — destructive migration without backup
Constraint gaps — missing NOT NULL, FK, uniqueness
Backward compat — old code reading new schema

Documentation

Stale instructions — install/setup steps that no longer work
API drift — documented endpoints don't match implementation
Missing docs — new features with no documentation
Broken examples — code samples that don't compile or run
Misleading error messages — error text doesn't match error condition
Version mismatch — docs reference old versions or deprecated features
Orphaned references — links to removed files or dead URLs
UI copy drift — help text diverges from actual behavior

Tests

Tautological tests — assertions that are always true regardless of code
Mock leakage — tests verify the mock, not the actual behavior
Coverage lies — line-covered but branch-untested code
Missing negative tests — only happy path tested
Fragile assertions — pass by coincidence (order, timing, locale)
Test-code drift — tests written for a previous code version
Flaky indicators — sleep(), retry, @Ignore/skip

Analysis

Hallucinated facts — claims without traceable source
Stale references — citing removed/renamed code
Logical leaps — conclusion doesn't follow from evidence
One-sided evidence — only supporting data, contradicting findings omitted

Agent Meta-Verification

Sycophantic deference — agrees without challenging assumptions
Hidden agenda — favors one approach without justification
Anchoring bias — first evidence disproportionately shapes conclusions
Confabulated confidence — high confidence on weak evidence
Premature convergence — jumps to one hypothesis
Evidence cherry-picking — selects only supporting evidence

Trust Integration

Optionally integrates with a multi-agent trust scoring system:

Each confirmed bug: -1 trust to the agent that wrote it
Each false positive: -1 trust to the verifier
Every 3 clean reviews: +1 trust to the developer

Track trust in .claude/agent-trust.json:

{
  "agents": {
    "dev": { "trust": 7, "clean_commits": 0 },
    "test": { "trust": 7, "clean_commits": 0 }
  }
}

Compatibility

Follows the Agent Skills format and works with Claude Code, Cursor, Windsurf, Cline and other compatible agents.

References

Chain-of-Verification (CoV) — Dhuliawala et al., 2023
Automated Auditing — Anthropic, 2025
Abstractive Red-Teaming — Anthropic, 2026
AuditBench — Anthropic, 2026
Strengthening Red Teams — Anthropic, 2025
Principles of Chaos Engineering

Origin

Extracted from the Rainbow Climb game development project, where it was used to catch critical bugs including:

Timer-based continuous fire (shootTimer never reset)
Patrol boundary flip-flop (velocity inverted every frame)
Missing collision bounds (collectibles had 0x0 rectangles)
Shield absorption blocking subsequent projectile checks

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.claude-plugin		.claude-plugin
skills/adversarial-verify		skills/adversarial-verify
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Adversarial Verification Skill

What it does

Verification Domains

Techniques

Abstractive Red-Teaming

Hidden Behavior Probing

Modular Adversarial Scaffold

Stress Techniques

Tri-Modal Reasoning

Anti-Fabrication Discipline

Agent Meta-Verification

Install

Update

Usage

What it catches

Code

Architecture

Data

Documentation

Tests

Analysis

Agent Meta-Verification

Trust Integration

Compatibility

References

Origin

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Claude Adversarial Verification Skill

What it does

Verification Domains

Techniques

Abstractive Red-Teaming

Hidden Behavior Probing

Modular Adversarial Scaffold

Stress Techniques

Tri-Modal Reasoning

Anti-Fabrication Discipline

Agent Meta-Verification

Install

Update

Usage

What it catches

Code

Architecture

Data

Documentation

Tests

Analysis

Agent Meta-Verification

Trust Integration

Compatibility

References

Origin

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages