ansible-devnet: 8-subnet homogeneous layout and full leanpoint upstreams by ch4r10t33r · Pull Request #176 · blockblaz/lean-quickstart

ch4r10t33r · 2026-05-17T08:19:54Z

Summary

Regenerate ansible-devnet/genesis/validator-config.yaml for a 64-validator / 8-subnet homogeneous devnet: each subnet is a single client family (qlean → lantern → ream → zeam → ethlambda → gean → grandine → nlean).
Aggregators (indices 0–7) use dedicated Aggregator_servers IPs; regular validators use the Validator_servers pool (23 hosts, up to 4 containers per IP). Tooling host 46.225.10.32 is excluded.
Leanpoint now polls every validator in validator-config.yaml by default (removed the earlier cap of 2 upstreams per subnet). sync-leanpoint-upstreams.sh passes --all-upstreams. Legacy subsetting is opt-in via --subnet-sample.
Ansible localhost add_host plays use strategy: linear so they work when ansible.cfg sets strategy=free for large devnet deploys.

Test plan

python3 convert-validator-config.py ansible-devnet/genesis/validator-config.yaml /tmp/up.json → 64 upstreams
python3 convert-validator-config.py ... /tmp/up.json --subnet-sample → 16 upstreams (8 subnets × 2)
spin-node.sh / Ansible deploy against updated validator-config.yaml (dry-run or staging)
sync-leanpoint-upstreams.sh regenerates tooling upstreams.json with full validator list

Add support for configuring nodes as aggregators through validator-config.yaml. This allows selective designation of nodes to perform aggregation duties by setting isAggregator: true in the validator configuration. Changes: - Add isAggregator field (default: false) to all validators in both local and ansible configs - Update parse-vc.sh to extract and export isAggregator flag - Modify all client command scripts to pass --is-aggregator flag when enabled - Add isAggregator status to node information output

Resolved conflicts in client-cmds scripts by keeping both: - Aggregator flag support - Checkpoint sync URL support Updated Docker images: - zeam: 0xpartha/zeam:devnet3 - lantern: piertwo/lantern:v0.0.3-test - ethlambda: ghcr.io/lambdaclass/ethlambda:devnet3 Added httpPort support for lantern nodes.

Resolve zeam-cmd.sh: keep single attestation_committee block and zeam_global_flags in node_binary.

…int upstreams Regenerate validator-config.yaml for 64 validators across 8 attestation subnets (one client family per subnet). Aggregators sit on dedicated aggregator hosts; regular validators use the Validator_servers IP pool. Leanpoint convert/sync now emits one upstream per validator by default (removed per-subnet cap of two). Optional --subnet-sample restores the legacy subset behavior. Ansible localhost plays that use add_host force strategy: linear so they work with ansible.cfg strategy=free on large devnets.

--prepare now installs tools, opens firewall ports, and starts Prometheus, Promtail, node_exporter, and cadvisor on every host. Add apt retries/throttle, prepare fork cap, and a single retry pass for transient lock failures.

…n slots Aggregators now get --aggregate-subnet-ids for their committee only (validator_index % attestation_committee_count) via parse-vc.sh and ansible zeam/ethlambda roles, not the full 0..N-1 CSV. Client cmd scripts pass a single subnet id; peam allowed_topics match the same rule. Rename former qlean_* / lantern_* validator nodes to zeam_8..15 and ethlambda_8..15 in ansible and local genesis configs to avoid clashing with existing zeam_0..7 / ethlambda_0..7 names.

Stop every Docker container on each unique validator-config IP except the per-host observability stack (prometheus, promtail, cadvisor, node_exporter). Document in README.

Use ansible command+loop instead of bash process substitution so the playbook runs under /bin/sh. Clarify that all stale containers are removed, not only validator-config names.

Configure unless-stopped on all containers via prepare/deploy, systemd Restart=always for docker.service, and a shared group_vars policy for new docker run invocations.

Kernel logs showed ream using up to ~15GiB RSS on 16GiB hosts with 2–3 validators per IP. Add per-client docker --memory limits, tighter limits on the 8GiB host, and run docker-restart-policy only on prepare (not mid-deploy).

Set all client docker_memory_limits to 3g on 157.90.254.146.

Drop 157.90.254.146 overrides; all devnet hosts use 16gib defaults.

Set every per-client docker_memory_limits entry to a uniform 4g and make each role's docker run skip --memory/--memory-swap when the node is an aggregator (one container per IP, no co-tenant memory pressure). Previously ream (5g), ethlambda (1.5g) and the rest (3g) were inconsistent, and the cap was applied unconditionally — so aggregators were also being throttled even though they own their host.

Replace the ansible fallback when client-cmd extraction fails: 0xpartha/zeam:local → blockblaz/zeam:devnet4 (matches defaults and zeam-cmd.sh).

Replace the former nlean column with grandine aggregators and add a second ream column on subnet 6 so gean and grandine each own one subnet.

Put grandine_0 on the grandine aggregator host, move ream_13 to the nlean slot, place gean_1 on 95.217.158.60, and relocate validators off hosts not in lean_ethereum_servers.txt.

Replace the eight aggregator host IPs with the Aggregator_servers list from lean_ethereum_servers.txt and keep assign-aggregator-ips.py in sync.

Wire the new zeam --rayon-threads CLI flag (zeam #903 / #899) into both the zeam-cmd.sh shell launcher and the ansible/roles/zeam docker run. Two knobs so non-aggregators can stay on zeam's compiled-in auto-split: ZEAM_RAYON_THREADS_AGGREGATOR / zeam_rayon_threads_aggregator aggregator-only override (wins for aggregators) ZEAM_RAYON_THREADS / zeam_rayon_threads uniform override applied to both roles Both unset (the default) is required for pre-#903 zeam images, which would refuse the flag and fail to start. The 16-vCPU recommended starting value is 12 (= cpu_count - 4 reserved system threads).

Apply twelve rayon workers whenever isAggregator is true unless ZEAM_RAYON_THREADS_AGGREGATOR overrides it.

Add an explicit docker pull step to every client Ansible role and use --pull=always on spin-node docker runs so registry tags are refreshed on each deploy.

…reams

Fix zeam chain-worker and rayon-threads CLI generation, set aggregator rayon to 12, replace ream subnet 2 with lantern, and harden stop-all-containers against unreachable hosts.

Use 0xpartha/zeam:local, set non-aggregator rayon to 6 on 8-vCPU hosts, and let 16-vCPU aggregators auto-tune (cpu_count - 4) when no override is set.

ch4r10t33r and others added 30 commits February 6, 2026 14:56

Merge branch 'main' of https://github.com/blockblaz/lean-quickstart

0522c16

Merge branch 'main' of https://github.com/blockblaz/lean-quickstart

1522fd6

Merge branch 'main' of https://github.com/blockblaz/lean-quickstart

99e1d5c

Merge branch 'main' of https://github.com/blockblaz/lean-quickstart

53814fa

Merge remote-tracking branch 'origin/main' into main

e8c649d

Resolve zeam-cmd.sh: keep single attestation_committee block and zeam_global_flags in node_binary.

Merge branch 'main' of github.com:blockblaz/lean-quickstart

7bcf5d2

Merge branch 'main' of github.com:blockblaz/lean-quickstart

6dbc842

Merge branch 'main' of github.com:blockblaz/lean-quickstart

d6bce3c

spin-node: add --stop-all-containers for ansible hosts

5fb131f

Stop every Docker container on each unique validator-config IP except the per-host observability stack (prometheus, promtail, cadvisor, node_exporter). Document in README.

spin-node: fix stop-all-containers for /bin/sh hosts

60df7cd

Use ansible command+loop instead of bash process substitution so the playbook runs under /bin/sh. Clarify that all stale containers are removed, not only validator-config names.

ansible: auto-restart containers and docker daemon on OOM/crash

a129744

Configure unless-stopped on all containers via prepare/deploy, systemd Restart=always for docker.service, and a shared group_vars policy for new docker run invocations.

ansible: cap container memory to stop ream/zeam host OOM

f6bdddb

Kernel logs showed ream using up to ~15GiB RSS on 16GiB hosts with 2–3 validators per IP. Add per-client docker --memory limits, tighter limits on the 8GiB host, and run docker-restart-policy only on prepare (not mid-deploy).

ansible: use uniform 3g memory limits on 8gib host

713dfe6

Set all client docker_memory_limits to 3g on 157.90.254.146.

ansible: remove host_vars for deleted 8gib server

f43e360

Drop 157.90.254.146 overrides; all devnet hosts use 16gib defaults.

ansible: default zeam docker image to blockblaz/zeam:devnet4

f60dec7

Replace the ansible fallback when client-cmd extraction fails: 0xpartha/zeam:local → blockblaz/zeam:devnet4 (matches defaults and zeam-cmd.sh).

ansible-devnet: gean on subnet 5, grandine on subnet 7

daee9d4

Replace the former nlean column with grandine aggregators and add a second ream column on subnet 6 so gean and grandine each own one subnet.

ansible-devnet: align aggregator IPs and fix validator hosts

1fe5238

Put grandine_0 on the grandine aggregator host, move ream_13 to the nlean slot, place gean_1 on 95.217.158.60, and relocate validators off hosts not in lean_ethereum_servers.txt.

ansible-devnet: point aggregators at new Aggregator_servers IPs

e24fd52

Replace the eight aggregator host IPs with the Aggregator_servers list from lean_ethereum_servers.txt and keep assign-aggregator-ips.py in sync.

zeam-cmd: default --rayon-threads 12 for aggregators

3679c87

Apply twelve rayon workers whenever isAggregator is true unless ZEAM_RAYON_THREADS_AGGREGATOR overrides it.

ansible,spin-node: always pull client docker images before run

29ecfe4

Add an explicit docker pull step to every client Ansible role and use --pull=always on spin-node docker runs so registry tags are refreshed on each deploy.

Merge branch 'main' into feat/devnet8-homogeneous-leanpoint-full-upst…

a5bc57e

…reams

Merge branch 'main' into feat/devnet8-homogeneous-leanpoint-full-upst…

bf833e0

…reams

ansible, genesis: devnet4 deploy fixes and subnet 2 lantern swap

eed3430

Fix zeam chain-worker and rayon-threads CLI generation, set aggregator rayon to 12, replace ream subnet 2 with lantern, and harden stop-all-containers against unreachable hosts.

zeam: deploy local image and rayon defaults for devnet

fd5bc0e

Use 0xpartha/zeam:local, set non-aggregator rayon to 6 on 8-vCPU hosts, and let 16-vCPU aggregators auto-tune (cpu_count - 4) when no override is set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ansible-devnet: 8-subnet homogeneous layout and full leanpoint upstreams#176

ansible-devnet: 8-subnet homogeneous layout and full leanpoint upstreams#176
ch4r10t33r wants to merge 31 commits into
mainfrom
feat/devnet8-homogeneous-leanpoint-full-upstreams

ch4r10t33r commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ch4r10t33r commented May 17, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant