Move Kubernetes workloads from runc to gVisor (runsc), safely and reversibly.
agentmoat.mp4
agentmoat moves Kubernetes workloads from the default runc runtime to gVisor
(runsc), the user-space kernel that defends against the kernel-exploit step of
a container-escape chain. It scans a cluster, classifies every workload by
gVisor compatibility, generates a deterministic migration plan with a stable
hash, applies it via RuntimeClass, and rolls it back on demand. On EKS
specifically there is no managed switch: Bottlerocket does not ship runsc,
managed node groups have no gVisor toggle, and AWS does not officially support
gVisor. agentmoat fills that gap.
The threat model and CVE backdrop live in docs/threat-model.md.
From a fresh clone, against a local kind cluster with real gVisor preinstalled:
git clone https://github.com/0hardik1/agentmoat
cd agentmoat
make kind-up # builds the gVisor-enabled kind node image, then creates the cluster
make e2e # runs scan -> plan -> apply -> rollback end-to-end and asserts the resultsmake e2e exercises the full pipeline against a real runsc runtime and
probes patched pods for gVisor markers in dmesg. It is the fastest way to
see the tool work.
Against your own cluster (kubectl context already set):
make build # produces ./bin/agentmoat
./bin/agentmoat scan # human-readable table
./bin/agentmoat scan --output json > scan.json # versioned schema
./bin/agentmoat plan --scan scan.json --output json > plan.json
./bin/agentmoat apply --plan plan.json # dry-run by default
./bin/agentmoat apply --plan plan.json --dry-run=false # actually mutate
./bin/agentmoat rollback --plan plan.json --dry-run=falsescan exits 0 when nothing is incompatible, 2 when at least one workload is.
apply and rollback are idempotent: re-running an applied plan reports every
step as already-applied and exits 0.
Build from source with Go 1.26+:
go install github.com/0hardik1/agentmoat/cmd/agentmoat@latest
go install github.com/0hardik1/agentmoat/cmd/agentmoat-mcp@latestOnce a release is tagged, prebuilt binaries (linux/darwin, amd64/arm64) and a
checksums file are attached to each
GitHub release; a Homebrew
formula and a kubectl agentmoat krew plugin follow.
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐
│ scan │ ───> │ plan │ ───> │ apply │ ───> │ rollback │
│ ScanReport│ │ Migration│ │ Apply │ │ Rollback │
│ (RO) │ │ Plan │ │ Result │ │ Result │
└──────────┘ └──────────┘ └──────────┘ └────────────┘
cluster pure function strategic-merge reverse, also
client-go over a Scan patch, default default dry-run
Report dry-run
scanis strictly read-only. It enumerates every Pod, Deployment, StatefulSet, DaemonSet, Job, and CronJob across the selected namespaces, classifies each one against the built-in rules, and emits a versionedScanReport.planis a pure function from aScanReportto aMigrationPlan. Same scan in, same plan and same SHA-256planHashout. The plan orders steps by risk (stateless first, no host-network first, etc.) and excludes any workload classifiedincompatible.applyis the only stage that mutates the cluster. It patches each pod template withruntimeClassNameand the matchingruntime=gvisor:NoScheduletoleration, stamps the namespace withagentmoat.io/plan-hash, emits a Kubernetes Event for each step it actually mutates, and appends an audit line for every step (dry-run included) to~/.agentmoat/audit.jsonl. It defaults to--dry-run=true. Re-running an applied plan reports every step asalready-appliedand exits 0 via the namespace annotation.rollbackwalks the same plan in reverse, removesruntimeClassName, and clears the namespace annotation. It deliberately leaves the toleration in place (a toleration without a matching taint is harmless, and removing a specific toleration by JSON Patch index is fragile).
The manual alternative is a sequence of kubectl get to enumerate, hand
classification against the gVisor docs, per-controller kubectl patch for
every Deployment/StatefulSet/DaemonSet, and a separate audit trail you build
yourself. agentmoat does the classification, orders the mutations, makes them
idempotent, records them, and undoes them.
14 stable rule IDs across three severities. Rule IDs are a public surface:
they appear in --output json, in --rules overrides, and in the
docs/compatibility-checklist.md table. The rule implementations live in
pkg/classifier/builtin_rules.go.
| Rule ID | Severity | What it inspects |
|---|---|---|
raw-socket |
error | CAP_NET_RAW, or agentmoat.io/needs-raw-socket=true |
host-network |
error | pod.spec.hostNetwork=true |
host-pid |
error | pod.spec.hostPID=true |
host-ipc |
error | pod.spec.hostIPC=true |
privileged |
error | Any container with securityContext.privileged=true |
ebpf |
error | Image hint (cilium, tetragon, falco) or CAP_BPF |
kvm-nested |
error | hostPath mount of /dev/kvm |
host-path-mount |
warn | Any hostPath volume |
gpu-passthrough |
warn | nvidia.com/gpu resource request or limit |
fuse-mount |
warn | CSI driver name containing fuse, or AGENTMOAT_USES_FUSE=true |
io-uring |
warn | Annotation agentmoat.io/uses-iouring=true |
perf-events |
warn | CAP_PERFMON or CAP_SYS_ADMIN |
network-throughput |
info | Image hint: nginx, envoy, haproxy, traefik (expect 20-40% overhead) |
syscall-heavy |
info | Image hint: redis, memcached (expect higher latency) |
Any error rule firing makes the workload incompatible and excludes it from
the plan. warn makes it review; the planner only includes review
workloads when --include-review is passed. info is purely advisory and
never blocks.
Override severities (or add rules) without recompiling via --rules <file.yaml>. See docs/compatibility-checklist.md.
Shapes come straight from internal/schema/types.go;
elided fields are marked ....
agentmoat scan --output json (one WorkloadResult from .spec.workloads):
{
"kind": "Deployment",
"namespace": "edge",
"name": "frontdoor",
"compatibility": "review",
"reasons": [
{
"ruleId": "network-throughput",
"severity": "info",
"description": "Workload appears network-throughput-bound (image hint: nginx, envoy, haproxy, traefik); expect ~20-40% throughput overhead under gVisor's sandbox network stack.",
"remediationUrl": "https://gvisor.dev/docs/architecture_guide/performance/"
}
],
"recommendation": "Benchmark under gVisor before opting in; consider host-network alternatives if throughput-critical.",
"overhead": "Network throughput: 20-40%"
}agentmoat plan --output json (one PlanStep and the top-level planHash):
{
"apiVersion": "agentmoat.io/v1alpha1",
"kind": "MigrationPlan",
"metadata": {
"generatedAt": "2026-05-22T17:14:03Z",
"planHash": "9f4b1c2d8a3e6f5b7c1e0d2a4f6b8e3c5d7a9f1b2e4d6a8c0f1b3e5d7a9c2e4f"
},
"spec": {
"summary": {"total": 3, "included": 2, "excluded": 1},
"options": {"runtimeClassName": "gvisor"},
"steps": [
{
"order": 1,
"target": {"kind": "Deployment", "namespace": "default", "name": "web"},
"action": "set-runtime-class",
"runtimeClassName": "gvisor",
"addToleration": true,
"waitFor": "Ready",
"riskScore": 10,
"notes": "Stateless Deployment fronted by a Service; safe to roll first."
}
],
"excluded": [...]
}
}agentmoat apply --output json (one StepResult and the apply envelope):
{
"apiVersion": "agentmoat.io/v1alpha1",
"kind": "ApplyResult",
"metadata": {
"planHash": "9f4b1c2d8a3e6f5b7c1e0d2a4f6b8e3c5d7a9f1b2e4d6a8c0f1b3e5d7a9c2e4f",
"dryRun": false
},
"spec": {
"summary": {"total": 2, "applied": 2, "alreadyApplied": 0, "skipped": 0, "failed": 0},
"steps": [
{
"order": 1,
"target": {"kind": "Deployment", "namespace": "default", "name": "web"},
"status": "applied",
"patch": "{\"spec\":{\"template\":{\"spec\":{\"runtimeClassName\":\"gvisor\",\"tolerations\":[{\"key\":\"runtime\",\"operator\":\"Equal\",\"value\":\"gvisor\",\"effect\":\"NoSchedule\"}]}}}}"
}
]
}
}-
Read-only by default.
scanandplannever mutate. They are safe to run against production from a CI job or a read-only kubeconfig. -
Dry-run by default.
applyandrollbackdefault to--dry-run=true: the patches are computed and surfaced in theStepResult.patchfield, but nothing is sent to the API server. Mutating requires explicit--dry-run=false. -
Idempotent. Every
applywrites the plan hash to the affected namespace asagentmoat.io/plan-hash. Re-running the same plan against the same cluster reports every step asalready-appliedand exits 0. -
Auditable. Every step appends one JSON line to
~/.agentmoat/audit.jsonl(disable with--no-audit), including dry-run steps, which carrydryRun: true. Each step that actually mutates also emits one Kubernetes Event on the patched object (disable with--no-events). -
Deterministic exit codes. CI scripts can branch on them:
Code Meaning 0 Success. Cluster matches requested state. 1 Generic error (kubeconfig, network, malformed plan, etc). 2 scan/explain namespace/explain workload: at least oneincompatibleworkload.3 applyorrollback: partial outcome; idempotent re-run is safe.4 verify:runtimeClassNamemismatch and/or in-pod probe did not find gVisor.Full table in
docs/exit-codes.md.
packer/eks-gvisor-al2023.pkr.hcl builds
an EKS-optimized AL2023 AMI with runsc and the containerd v2 shim
preinstalled, the containerd drop-in pre-staged, and the systrap platform
pinned (KVM is unavailable on EKS instances).
cd packer
packer init .
packer validate .
packer build .Wire the resulting AMI into a self-managed node group or a Karpenter
EC2NodeClass, label the nodes runtime=gvisor, taint them
runtime=gvisor:NoSchedule, and apply deploy/runtimeclass.yaml. The
end-to-end recipe (CFN/Terraform snippets, IAM, Karpenter wiring) is tracked
in docs/eks-deployment.md.
make kind-up builds a custom kind worker image
(kind/Dockerfile.gvisor-node) that ships
/usr/local/bin/runsc and the containerd v2 shim. The cluster topology
(kind/cluster.yaml) is a stock control plane plus one
worker labelled runtime=gvisor; the RuntimeClass in
test/e2e/manifests/runtimeclass.yaml
uses handler: gvisor so pods carrying runtimeClassName: gvisor really
execute under runsc. Honors CLUSTER_NAME and KEEP_CLUSTER=1 for
iteration.
make kind-up # idempotent; rebuilds the image only when missing
make e2e # full scan -> plan -> apply -> rollback against the cluster
KEEP_CLUSTER=1 make e2e
make kind-down| Command | Purpose |
|---|---|
agentmoat scan |
Enumerate workloads and classify gVisor compatibility. RO. |
agentmoat plan |
Produce a deterministic MigrationPlan from a scan. RO. |
agentmoat apply |
Patch workloads per the plan. Default dry-run. Idempotent. |
agentmoat rollback |
Reverse a previously applied plan. Default dry-run. |
agentmoat verify |
Confirm live pods match the plan's runtimeClassName. RO. |
agentmoat explain |
Embedded docs viewer; explain namespace / explain workload for deep scans. |
agentmoat version |
Print binary version and git SHA. |
| Flag | Default | Purpose |
|---|---|---|
--output / -o |
table |
table, json, or yaml. json/yaml follow agentmoat.io/v1alpha1. |
--kubeconfig |
$KUBECONFIG |
Path to kubeconfig. |
--context |
current-context | kubeconfig context to use. |
--namespace / -n |
(all) | Repeatable. Default: scan every namespace. |
--all-namespaces / -A |
true if -n unset |
Explicit all-namespaces flag. |
--selector / -l |
(none) | Kubernetes label selector applied to every list call. |
--include-system |
false |
Include kube-system and other kube-* namespaces. |
--rules |
(none) | Path to YAML overriding rule severities or adding rules. |
--explain |
false |
Inline educational notes in supported output formats. |
| Flag | Command | Default | Purpose |
|---|---|---|---|
--scan |
plan |
(inline) | Read a stored ScanReport from disk instead of scanning. |
--include-review |
plan |
false |
Also include review-class workloads in the plan. |
--runtime-class |
plan |
gvisor |
RuntimeClass name to patch onto migrated workloads. |
--plan |
apply/rollback | required | Path to a MigrationPlan JSON/YAML. |
--dry-run |
apply/rollback | true |
Compute patches but do not mutate the cluster. |
--no-events |
apply/rollback | false |
Do not emit Kubernetes Events per mutation. |
--no-audit |
apply/rollback | false |
Do not append to ~/.agentmoat/audit.jsonl. |
--in-pod-probe |
verify | false |
Exec into a running pod and check dmesg/cmdline for gVisor markers. |
| Format | Use case |
|---|---|
table |
Default human-readable view; per-workload row with verdict and top reason. |
json |
Versioned, stable schema (agentmoat.io/v1alpha1). Pipe into jq or store as evidence. |
yaml |
Byte-identical to JSON after canonicalisation; convenient for review/diff. |
Why not just kubectl patch everything? You can. agentmoat is the same
operation written down: it classifies (so you do not silently patch
incompatible workloads), it orders by risk (stateless and no-host-network
first), it is idempotent (the namespace annotation makes re-runs safe), it
keeps an audit trail, and it has a one-command rollback.
Is it safe to run against production? scan and plan are strictly
read-only. apply and rollback default to --dry-run=true and surface the
exact strategic-merge patch in StepResult.patch before any mutation. A
read-only kubeconfig is sufficient to run scan and plan: ready-to-bind
RBAC for both modes ships under deploy/ as
clusterrole-readonly.yaml (scan / plan / verify) and clusterrole-apply.yaml
(apply / rollback).
What about workloads that need raw sockets, eBPF, or GPU passthrough?
The classifier marks them incompatible. The planner excludes them. The
reasons[] field on each WorkloadResult names the rule that fired and
links to the relevant gVisor doc, so the recommendation is concrete: either
leave the workload on runc (and put it on a non-gVisor node pool) or
adopt the gVisor option that supports it (--net-raw, nvproxy).
Does agentmoat install anything in-cluster? No CRDs, no webhooks, no
controllers. It patches pod templates and reads/writes one namespace
annotation (agentmoat.io/plan-hash). The only in-cluster prerequisite is
a RuntimeClass named gvisor (or whatever --runtime-class was passed)
that points at a real runsc-shipping node.
Why a custom Packer AMI on EKS? Bottlerocket does not ship runsc,
managed node groups have no gVisor switch, and AWS does not officially
support gVisor. The path of least resistance is to bring your own AL2023
node image with runsc baked in. packer/eks-gvisor-al2023.pkr.hcl is that
image.
Why systrap, not KVM? On EKS the instance kernels do not expose
/dev/kvm to user-space; on macOS-hosted kind, KVM is unavailable too. The
systrap platform works in both environments. Pinned in
kind/runsc.toml and packer/files/runsc.toml.
Phase 0 (foundation), Phase 1 (read-only scan + classifier), and Phase 2
(planner + applier + rollback, with idempotency and audit) are committed.
agentmoat scan, plan, apply, and rollback are wired and exercised
end-to-end against a real gVisor kind cluster.
Roadmap:
- Phase 3:
agentmoat verify(podruntimeClassNamecheck; optional--in-pod-probefor in-container confirmation) andagentmoat explain(embedded docs viewer). Both shipped. - Phase 5+: EKS end-to-end recipe (CloudFormation/Terraform/Karpenter), additional Packer variants.
- Architecture: the Go library at the core; CLI as a thin shell.
- gVisor 101: Sentry, Gofer, platforms, and where the overhead lives.
- RuntimeClass 101: one-page intro to the
RuntimeClassAPI. - Threat model: what gVisor stops that
runcdoes not, with CVE references. - Compatibility checklist: the full rule catalog and
--rulesoverride schema. - Exit codes: the deterministic exit codes by command.
- EKS deployment: the Packer + EKS recipe (stub today; tracked for Phase 5).
- Kind quickstart: bring up a local cluster with gVisor preinstalled (doc is a stub today;
make kind-upis the working path).
Apache License 2.0. See LICENSE.