Skip to content

fullo/claude-adversarial-skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Skill Version Skills License: MIT Agent Skills

Claude Adversarial Verification Skill

A Claude Code skill that performs rigorous adversarial verification across code, architecture, data, documentation, tests, and analysis using Chain-of-Verification (CoV) enhanced with abstractive red-teaming, hidden behavior probing, stress techniques, and tri-modal reasoning.

What it does

When invoked, this skill launches a skeptical verifier agent that follows a structured protocol:

Pre-verification (Steps 0–0b):

  • Identify what needs verification — code, architecture, data, documentation, tests, or analysis
  • Gather artifacts — the actual outputs to verify
  • Establish ground truth — what to verify against

Chain-of-Verification (Steps 1–2b):

  • Decompose artifacts into individual verifiable claims
  • Classify reasoning mode — deductive, inductive, or abductive per claim
  • Generate adversarial questions for each claim ("what would make this fail?")
  • Abstract to failure categories — find patterns, not just individual bugs

Deep Verification (Steps 3–3d):

  • Independently verify each claim by tracing actual paths
  • Probe for hidden behaviors — detect what the code doesn't advertise
  • Apply adversarial scaffold — suspicion modeling, attack selection, subtlety detection
  • Stress test — Existence Question, Scale Shift, Time Travel, Requirement Inversion

Reporting (Steps 4–5):

  • Report findings with reasoning-aware confidence scoring and anti-fabrication discipline
  • Survived verdicts — stress tests that hold are as valuable as those that break
  • Hypotheses — abductive findings reported separately, with alternatives and tests
  • Propose project doc updates — TODO.md, SPEC.md, PLAN.md (with user confirmation)

Verification Domains

Domain What it verifies Ground truth
Code Source changes, logic, behavior Tests, type system, spec
Architecture Design decisions, spec coverage Requirements, constraints, patterns
Data Schemas, migrations, contracts Production schema, validation rules
Documentation Technical, process, and user-facing docs Actual codebase, current API, git history
Tests Test suite integrity and honesty Production code, requirements, coverage reports
Analysis Agent outputs, reports, docs Source material, cited references

Techniques

Abstractive Red-Teaming

Instead of finding individual bugs, identifies failure categories — general patterns that produce bugs repeatedly. Searches the entire codebase for instances of the same pattern (frequency assumptions, implicit ordering, stale state, missing completeness, silent fallthrough, assumed environment).

Hidden Behavior Probing

Detects behaviors the code doesn't advertise using four probing strategies: indirect probing (trace actual execution), scaffolded probing (chain findings), cross-reference probing (claims vs reality), and absence probing (what's NOT there).

Modular Adversarial Scaffold

Decomposes the adversarial process into five modules: suspicion modeling (what would a reviewer miss?), attack selection (highest-risk claims first), plan synthesis (multi-step trace chains), execution (actually read the code), and subtlety detection (code that hides complexity).

Stress Techniques

Inspired by Principles of Chaos Engineering, adapted for review. Four techniques with forced variety (minimum 3 per run, never repeat): Existence Question (should this exist at all?), Scale Shift (what at 10x? at zero?), Time Travel (what in 6 months?), Requirement Inversion (what if the opposite?). Produces Survived: yes/no verdicts — knowing what's robust is as valuable as knowing what's fragile.

Tri-Modal Reasoning

Each claim is classified by reasoning mode — deductive (verify against ground truth), inductive (generalize from 3+ instances), or abductive (generate best explanation from observations). Abductive findings are reported as hypotheses with alternative explanations and proposed tests, never as verified facts.

Anti-Fabrication Discipline

Before claiming something doesn't exist, you must state where you looked. Confidence scoring is tied to reasoning mode: deductive (80-100, source cited), inductive (60-79, 3+ instances), abductive (40-59, hypothesis with alternatives). Hard constraint: no score above 79 without citing specific file/line/doc.

Agent Meta-Verification

When reviewing output from another AI agent, checks for: sycophantic deference, hidden agenda, anchoring bias, confabulated confidence, premature convergence, and evidence cherry-picking.

Install

Copy the skill directory:

cp -r skills/adversarial-verify ~/.claude/skills/

Or clone and copy:

git clone https://github.com/fullo/claude-adversarial-skill.git
cp -r claude-adversarial-skill/skills/adversarial-verify ~/.claude/skills/

Or install from the marketplace (recommended):

# Add the marketplace (once)
claude plugin marketplace add fullo/claude-plugins-marketplace

# Install the plugin
claude plugin install adversarial-verify@fullo-plugins

Update

claude plugin update adversarial-verify@fullo-plugins

The plugin system uses git commit hashes as versions. There is no automatic update notification: run the command above periodically to stay current.

Usage

In Claude Code, type:

/adversarial-verify

Or ask naturally:

"run an adversarial review on my recent changes"
"CoV check the last commit"
"verify this code with total skepticism"
"verify the PLAN.md against the SPEC.md"
"adversarial check on this migration"
"verify this agent's analysis report"
"look for systemic failure patterns in the codebase"
"probe this function for hidden behaviors"
"verify the README matches the actual install process"
"verify the tests actually test what they claim"
"stress test the auth module"
"what happens at 10x scale?"
"check if the planning agent's output is biased"

What it catches

Code

  • Silent data corruption — values that look correct but aren't
  • Logic flaws — code that passes simple tests but fails edge cases
  • Initialization order bugs — field A used before field B is set
  • Concurrent modification — adding to a list while iterating it
  • State leaks — data persisting across frames/calls when it shouldn't
  • Boundary conditions — off-by-one, coordinate system errors
  • Resource exhaustion — unbounded lists, missing cleanup

Architecture

  • Spec drift — implementation diverges from SPEC.md
  • Missing constraints — PLAN.md doesn't address known edge cases
  • Over-engineering — abstraction without justification
  • Dependency risk — new deps without evaluation
  • Breaking changes — API contract violations

Data

  • Schema inconsistency — migration doesn't match model
  • Data loss risk — destructive migration without backup
  • Constraint gaps — missing NOT NULL, FK, uniqueness
  • Backward compat — old code reading new schema

Documentation

  • Stale instructions — install/setup steps that no longer work
  • API drift — documented endpoints don't match implementation
  • Missing docs — new features with no documentation
  • Broken examples — code samples that don't compile or run
  • Misleading error messages — error text doesn't match error condition
  • Version mismatch — docs reference old versions or deprecated features
  • Orphaned references — links to removed files or dead URLs
  • UI copy drift — help text diverges from actual behavior

Tests

  • Tautological tests — assertions that are always true regardless of code
  • Mock leakage — tests verify the mock, not the actual behavior
  • Coverage lies — line-covered but branch-untested code
  • Missing negative tests — only happy path tested
  • Fragile assertions — pass by coincidence (order, timing, locale)
  • Test-code drift — tests written for a previous code version
  • Flaky indicatorssleep(), retry, @Ignore/skip

Analysis

  • Hallucinated facts — claims without traceable source
  • Stale references — citing removed/renamed code
  • Logical leaps — conclusion doesn't follow from evidence
  • One-sided evidence — only supporting data, contradicting findings omitted

Agent Meta-Verification

  • Sycophantic deference — agrees without challenging assumptions
  • Hidden agenda — favors one approach without justification
  • Anchoring bias — first evidence disproportionately shapes conclusions
  • Confabulated confidence — high confidence on weak evidence
  • Premature convergence — jumps to one hypothesis
  • Evidence cherry-picking — selects only supporting evidence

Trust Integration

Optionally integrates with a multi-agent trust scoring system:

  • Each confirmed bug: -1 trust to the agent that wrote it
  • Each false positive: -1 trust to the verifier
  • Every 3 clean reviews: +1 trust to the developer

Track trust in .claude/agent-trust.json:

{
  "agents": {
    "dev": { "trust": 7, "clean_commits": 0 },
    "test": { "trust": 7, "clean_commits": 0 }
  }
}

Compatibility

Follows the Agent Skills format and works with Claude Code, Cursor, Windsurf, Cline and other compatible agents.

References

Origin

Extracted from the Rainbow Climb game development project, where it was used to catch critical bugs including:

  • Timer-based continuous fire (shootTimer never reset)
  • Patrol boundary flip-flop (velocity inverted every frame)
  • Missing collision bounds (collectibles had 0x0 rectangles)
  • Shield absorption blocking subsequent projectile checks

License

MIT

About

Claude Code skill for adversarial code verification using Chain-of-Verification (CoV) methodology

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors