Add live memory ceiling for macOS (vz) VMs#278
Conversation
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Update file:line references that drifted when the clonefile and rosetta/multi-platform changes merged to main (createVM gained a device block, shifting computeMemorySize and the balloon block by +4; create.go balloon-policy wiring moved into guestMemoryConfig; HotplugSize moved). Correct the integration-test path to lib/instances/, and note that the proposed MemoryCeilingBytes threading and derived-capability flag now have a merged precedent (Platform / derived EnableRosetta take the identical request -> VMConfig -> buildShimConfigFromVMConfig -> ShimConfig path). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
d69eebd to
527ecf7
Compare
Apple's Virtualization.framework cannot grow a guest's RAM above its boot size; the only runtime lever is the traditional balloon, which reclaims down from boot size. This lets a vz VM boot at a configured memory ceiling, balloon down to its baseline (the normal Size), and have the existing host-pressure controller move the balloon target within [floor, ceiling] — elastic usable memory with no reboot. - ShimConfig.MemoryCeilingBytes: createVM boots at the ceiling when it exceeds the baseline and makes the balloon mandatory (attach regardless of the enable/require flags, fail creation if it cannot attach). After Start, the shim balloons the guest down to the baseline on cold boot; restore resumes an already-ballooned guest. - Thread the ceiling request -> stored instance -> hypervisor.VMConfig -> buildShimConfigFromVMConfig -> ShimConfig, mirroring the Platform/ EnableRosetta path. It rides ShimConfig through the snapshot manifest, so restore needs no extra change. - Validation: ceiling 0 means no ceiling (identical to today); a ceiling at or below Size is rejected; a ceiling above the per-instance memory limit is rejected; the host-RAM bound is enforced by the shim's vz Validate(). - providers ListBalloonVMs reports AssignedMemoryBytes = max(size+hotplug, ceiling) so the controller's upper clamp becomes the ceiling, and a baseline so the controller holds there while the host is healthy. The protected floor is anchored on the baseline rather than the ceiling. - Grow-on-demand: GrowOnDemandEnabled (default off) and GrowUtilizationPercent (default 85, clamped 1..99) on ActiveBallooningConfig plus growthTargetBytes; with the flag off behavior is byte-identical to today. A measured guest-memory signal is left as a follow-up. - SupportsLiveMemoryCeiling capability, derived per-instance for vz when a ceiling is configured; SupportsHotplugMemory stays false. - config.example.darwin.yaml gains grow_on_demand_enabled/ grow_utilization_percent and a corrected hotplug note. Builds on the threading precedent from #279 and the fork fast path in #276. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add a human-readable memory_ceiling string to the create-instance request and the Instance response (mirroring size/hotplug_size), regenerate the OpenAPI bindings, and wire it through the create handler and the instance->response mapping. The request parses to bytes via the same datasize helper; the response reports the resolved ceiling, omitted when no ceiling is set. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Linux unit tests (run everywhere): growthTargetBytes (no-op when disabled, grows to the ceiling only above the utilization threshold, never beyond the ceiling, never below the floor); ActiveBallooningConfig.Normalize clamps for the new grow fields; the controller holds a ceiling VM at its baseline when grow is off and still recovers an ordinary reclaimed VM to full; ceiling validation (reject <= size, accept > size); providers AssignedMemoryBytes = max(size+hotplug, ceiling) with baseline = size+hotplug. Darwin integration test TestVZMemoryCeiling (gated by HYPEMAN_RUN_GUESTMEMORY_TESTS=1, arm64): boots an nginx:alpine vz VM with Size=1GiB and a 4GiB ceiling, asserts a balloon device is attached, that /proc/meminfo MemTotal reflects the ~4GiB boot ceiling, that the balloon settles near 1GiB with a low host RSS, then deflates to 4GiB and asserts usable memory climbs without a reboot. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
✱ Stainless preview builds for hypemanThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ hypeman-openapi studio · code · diff
✅ hypeman-go studio · code · diff
✅ hypeman-typescript studio · code · diff
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
Measure currentTotalReclaim below the baseline anchor instead of the ceiling: a ceiling VM idling at its baseline reclaims nothing real (the ballooned pages were never resident), so counting its headroom as reclaim under the Stressed branch squeezed co-tenants below their own baseline. Compute the proportional reclaim split in 128 bits. With a large ceiling the operands approach total headroom and the int64 product overflowed once they exceed ~2.8GiB, wrapping to a negative reclaim that corrupted the split (one VM absorbed everything to its floor, its peer gave up nothing) and silently failed to reclaim under real pressure. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reject a non-zero memory ceiling at create time for any backend other than vz: backends with real hotplug ignore the ceiling and boot at size, but the controller was still told their assigned max was the ceiling, mis-accounting their reclaim headroom. Fail VM creation in the shim when a ceiling exceeds the host maximum instead of silently clamping the boot size: a clamped boot left the controller treating the full (unreachable) ceiling as the balloon's upper bound. This makes the boot-ceiling contract's reject rule real, so a running ceiling VM's assigned max always equals its actual boot size. Also log the real boot size on start, drop the inaccurate "Resolved" wording from the response field, and document that the live-ceiling capability is only known on the start/restore client. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Temporary CI step to prove the gated TestVZMemoryCeiling darwin integration test (boot-at-ceiling, balloon-to-baseline, live grow) actually passes on the real macOS arm64 runner. Reverted before merge. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The traditional balloon target is a request the guest fulfills lazily, so reading guest MemAvailable immediately after the target reaches baseline captured a near-ceiling value and made the subsequent grow assertion impossible. Wait for the guest's MemAvailable to actually fall below the ceiling midpoint (proving the balloon inflated in-guest) before measuring the grow baseline. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ilable Usable memory on the traditional balloon is the target the controller sets; the guest fulfills it lazily and may not reflect it in MemAvailable, and the design accounts on the target, not the achieved size. Drop the guest-visible MemAvailable grow/shrink assertions and verify the actual contract instead: the guest boots at the ceiling (MemTotal), the target settles at the baseline with low resident host RSS, and the target can be raised back to the ceiling with the VM still Running (no reboot). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The gated TestVZMemoryCeiling integration test was verified passing on the macOS arm64 runner; restore test.yml to match main. Feature and test code are unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The grow_on_demand_enabled knob is plumbed but inert until a per-guest memory-demand signal exists, so the operator-facing example should say so rather than presenting it as a working toggle. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Pressure reclaim grows ceiling VMs
- Pressure planning now computes ceiling-VM reclaim headroom and targets from baseline (not assigned ceiling), preventing pressure reclaim from increasing balloon targets.
Or push these changes by commenting:
@cursor push 339e327cdb
Preview (339e327cdb)
diff --git a/lib/guestmemory/controller.go b/lib/guestmemory/controller.go
--- a/lib/guestmemory/controller.go
+++ b/lib/guestmemory/controller.go
@@ -187,12 +187,13 @@
}
currentTotalReclaim += currentReclaim
+ reclaimBase := clampInt64(floorAnchorBytes(vm), protectedFloor, vm.AssignedMemoryBytes)
candidates = append(candidates, candidateState{
vm: vm,
hv: hv,
currentTargetGuestBytes: currentTarget,
protectedFloorBytes: protectedFloor,
- maxReclaimBytes: maxInt64(0, vm.AssignedMemoryBytes-protectedFloor),
+ maxReclaimBytes: maxInt64(0, reclaimBase-protectedFloor),
})
}
summary.eligibleVMs = len(candidates)
diff --git a/lib/guestmemory/planner.go b/lib/guestmemory/planner.go
--- a/lib/guestmemory/planner.go
+++ b/lib/guestmemory/planner.go
@@ -41,7 +41,7 @@
var totalHeadroom int64
for _, candidate := range candidates {
totalHeadroom += candidate.maxReclaimBytes
- targets[candidate.vm.ID] = candidate.vm.AssignedMemoryBytes
+ targets[candidate.vm.ID] = candidate.baselineGuestBytes()
}
if totalHeadroom <= 0 {
return targets
@@ -61,7 +61,7 @@
if reclaim > candidate.maxReclaimBytes {
reclaim = candidate.maxReclaimBytes
}
- targets[candidate.vm.ID] = candidate.vm.AssignedMemoryBytes - reclaim
+ targets[candidate.vm.ID] = candidate.baselineGuestBytes() - reclaim
remainder -= reclaim
}
@@ -69,7 +69,7 @@
if remainder <= 0 {
break
}
- currentReclaim := candidate.vm.AssignedMemoryBytes - targets[candidate.vm.ID]
+ currentReclaim := candidate.baselineGuestBytes() - targets[candidate.vm.ID]
headroomLeft := candidate.maxReclaimBytes - currentReclaim
if headroomLeft <= 0 {
continue
diff --git a/lib/guestmemory/planner_test.go b/lib/guestmemory/planner_test.go
--- a/lib/guestmemory/planner_test.go
+++ b/lib/guestmemory/planner_test.go
@@ -24,6 +24,33 @@
}
}
+func TestPlanGuestTargetsCeilingVMReclaimStartsFromBaseline(t *testing.T) {
+ const gib = int64(1024 * 1024 * 1024)
+ const baseline = 1 * gib
+ const ceiling = 4 * gib
+ const floor = baseline / 2
+
+ // Ceiling VMs run at baseline when healthy; pressure reclaim must start from
+ // that baseline, never by "reclaiming" from the ceiling and growing the
+ // balloon target above baseline.
+ candidates := []candidateState{
+ {
+ vm: BalloonVM{
+ ID: "vz-ceiling",
+ AssignedMemoryBytes: ceiling,
+ BaselineMemoryBytes: baseline,
+ },
+ protectedFloorBytes: floor,
+ maxReclaimBytes: baseline - floor,
+ },
+ }
+
+ targets := planGuestTargets(ActiveBallooningConfig{}, candidates, gib)
+ if got := targets["vz-ceiling"]; got != floor {
+ t.Fatalf("ceiling VM reclaim should plan down from baseline to floor %d, got %d", floor, got)
+ }
+}
+
func TestFloorAnchorBytesUsesBaselineForCeilingVM(t *testing.T) {
const gib = int64(1024 * 1024 * 1024)
const baseline = 1 * gibYou can send follow-ups to the cloud agent here.
planGuestTargets computed each guest's reclaim target as AssignedMemoryBytes - reclaim, and maxReclaimBytes as AssignedMemoryBytes - floor. For a ceiling VM idling at its baseline (target well below the ceiling) that lands above the baseline, so under host pressure the reconcile loop raised the balloon target — inflating the guest instead of reclaiming it. Anchor both on floorAnchorBytes (the baseline; equal to the assigned size for ordinary VMs, so their behavior is unchanged) so reclaim moves a ceiling VM from its baseline toward its floor. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Controller undoes balloon API grow
- When healthy with grow-on-demand disabled, reconcile now holds at max(baseline,current target) so externally grown balloon targets are preserved instead of being forced back to baseline, and a regression test covers this case.
Or push these changes by commenting:
@cursor push 5c22aead5a
Preview (5c22aead5a)
diff --git a/lib/guestmemory/controller.go b/lib/guestmemory/controller.go
--- a/lib/guestmemory/controller.go
+++ b/lib/guestmemory/controller.go
@@ -227,18 +227,24 @@
summary.manualHoldActive = state.manualHold != nil
plannedTargets := planGuestTargets(c.config, candidates, totalTarget)
- // No reclaim demanded means the host is healthy: hold each guest at its
- // baseline (baseline == assigned for ordinary VMs, so this recovers prior
- // reclaim unchanged). growthTargetBytes returns the baseline while grow-on-demand
- // is off. This per-VM grow has no aggregate host-RAM cap, which is safe only
- // because utilizationPercent() is 0 today; a real signal (RFC milestone 4) must
- // route grow through a host-aware planner.
+ // No reclaim demanded means the host is healthy: recover reclaimed guests up to
+ // their baseline (baseline == assigned for ordinary VMs), but do not pull down
+ // a target that was already grown externally (for example through the balloon
+ // API) while grow-on-demand is disabled.
+ //
+ // This per-VM grow has no aggregate host-RAM cap, which is safe only because
+ // utilizationPercent() is 0 today; a real signal (RFC milestone 4) must route
+ // grow through a host-aware planner.
if totalTarget == 0 {
for _, candidate := range candidates {
baseline := candidate.baselineGuestBytes()
+ holdTarget := baseline
+ if !c.config.GrowOnDemandEnabled {
+ holdTarget = maxInt64(holdTarget, candidate.currentTargetGuestBytes)
+ }
plannedTargets[candidate.vm.ID] = growthTargetBytes(
c.config,
- baseline,
+ holdTarget,
candidate.vm.AssignedMemoryBytes,
candidate.protectedFloorBytes,
candidate.utilizationPercent(),
diff --git a/lib/guestmemory/controller_test.go b/lib/guestmemory/controller_test.go
--- a/lib/guestmemory/controller_test.go
+++ b/lib/guestmemory/controller_test.go
@@ -189,6 +189,41 @@
assert.Equal(t, baseline, hv.target, "balloon target must remain at baseline")
}
+func TestHealthyPreservesExternallyGrownCeilingVMWhenGrowOnDemandOff(t *testing.T) {
+ const mib = int64(1024 * 1024)
+ const baseline = 1024 * mib
+ const ceiling = 4096 * mib
+
+ src := &stubSource{
+ vms: []BalloonVM{
+ {ID: "a", Name: "a", HypervisorType: hypervisor.TypeVZ, SocketPath: "a", AssignedMemoryBytes: ceiling, BaselineMemoryBytes: baseline},
+ },
+ }
+ // Simulate a prior live grow via the balloon API.
+ hv := &stubHypervisor{target: ceiling, capabilities: hypervisor.Capabilities{SupportsBalloonControl: true}}
+
+ c := NewController(Policy{Enabled: true, ReclaimEnabled: true}, ActiveBallooningConfig{
+ Enabled: true,
+ ProtectedFloorPercent: 50,
+ ProtectedFloorMinBytes: 0,
+ MinAdjustmentBytes: 1,
+ PerVMMaxStepBytes: ceiling,
+ PerVMCooldown: time.Millisecond,
+ GrowOnDemandEnabled: false,
+ }, src, slog.New(slog.NewTextHandler(io.Discard, nil))).(*controller)
+ c.sampler = &stubSampler{sample: HostPressureSample{TotalBytes: 64 * 1024 * mib, AvailableBytes: 32 * 1024 * mib, AvailablePercent: 50}}
+ c.reconcileMu.newClient = func(_ hypervisor.Type, _ string) (hypervisor.Hypervisor, error) {
+ return hv, nil
+ }
+
+ resp, err := c.TriggerReclaim(context.Background(), ManualReclaimRequest{ReclaimBytes: 0})
+ require.NoError(t, err)
+ require.Len(t, resp.Actions, 1)
+ assert.Equal(t, "unchanged", resp.Actions[0].Status)
+ assert.Equal(t, ceiling, resp.Actions[0].PlannedTargetGuestMemoryBytes, "healthy reconcile must not undo external balloon API grow")
+ assert.Equal(t, ceiling, hv.target, "balloon target must remain at externally grown value")
+}
+
func TestStressedCeilingVMAtBaselineDoesNotSqueezeCoTenant(t *testing.T) {
const mib = int64(1024 * 1024)
const baseline = 1024 * mibYou can send follow-ups to the cloud agent here.
The healthy-state reconcile forced every guest to its baseline, so a ceiling VM grown above baseline via the balloon API (or a future auto-grow signal) was pulled back down on the next poll, making the ceiling headroom unreachable whenever active ballooning is enabled. Hold at max(currentTarget, baseline) instead: reclaimed guests still recover up to baseline, but an explicit grow is preserved. Reclaim under pressure is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Reclaim metrics use ceiling anchor
- Reconcile now computes planned and applied reclaim from each VM’s floor anchor (baseline for ceiling VMs) with a zero floor, eliminating the ceiling-minus-baseline phantom reclaim in response totals and per-action metrics.
- ✅ Fixed: Baseline ignores vz hotplug split
- Provider mapping now forces vz baseline memory to
Size(ignoring hotplug) so controller baseline/floor behavior matches the vz shim’s balloon baseline and avoids inflated baseline targets.
- Provider mapping now forces vz baseline memory to
Or push these changes by commenting:
@cursor push 4ee4790473
Preview (4ee4790473)
diff --git a/lib/guestmemory/controller.go b/lib/guestmemory/controller.go
--- a/lib/guestmemory/controller.go
+++ b/lib/guestmemory/controller.go
@@ -307,7 +307,8 @@
Status: "unchanged",
}
- resp.PlannedReclaimBytes += candidate.vm.AssignedMemoryBytes - plannedTarget
+ anchorBytes := floorAnchorBytes(candidate.vm)
+ resp.PlannedReclaimBytes += maxInt64(0, anchorBytes-plannedTarget)
if !req.dryRun && appliedTarget != candidate.currentTargetGuestBytes {
if err := candidate.hv.SetTargetGuestMemoryBytes(applyCtx, appliedTarget); err != nil {
@@ -335,7 +336,7 @@
action.TargetGuestMemoryBytes = appliedTarget
}
if !req.dryRun {
- action.AppliedReclaimBytes = candidate.vm.AssignedMemoryBytes - action.TargetGuestMemoryBytes
+ action.AppliedReclaimBytes = maxInt64(0, anchorBytes-action.TargetGuestMemoryBytes)
}
resp.AppliedReclaimBytes += action.AppliedReclaimBytes
resp.Actions = append(resp.Actions, action)
diff --git a/lib/guestmemory/controller_test.go b/lib/guestmemory/controller_test.go
--- a/lib/guestmemory/controller_test.go
+++ b/lib/guestmemory/controller_test.go
@@ -184,7 +184,10 @@
resp, err := c.TriggerReclaim(context.Background(), ManualReclaimRequest{ReclaimBytes: 0})
require.NoError(t, err)
require.Len(t, resp.Actions, 1)
+ assert.Equal(t, int64(0), resp.PlannedReclaimBytes, "baseline-held ceiling VM should not report phantom planned reclaim")
+ assert.Equal(t, int64(0), resp.AppliedReclaimBytes, "baseline-held ceiling VM should not report phantom applied reclaim")
assert.Equal(t, "unchanged", resp.Actions[0].Status)
+ assert.Equal(t, int64(0), resp.Actions[0].AppliedReclaimBytes, "per-action reclaim should be anchored to baseline")
assert.Equal(t, baseline, resp.Actions[0].TargetGuestMemoryBytes, "ceiling VM should hold at baseline, not grow to ceiling, when grow-on-demand is off")
assert.Equal(t, baseline, hv.target, "balloon target must remain at baseline")
}
diff --git a/lib/providers/providers.go b/lib/providers/providers.go
--- a/lib/providers/providers.go
+++ b/lib/providers/providers.go
@@ -305,10 +305,13 @@
// balloonVMForInstance maps a stored instance onto the controller's view. The
// baseline is the guest's normal running size; a vz boot ceiling is the live
// maximum the balloon can deflate to, so it drives the controller's upper clamp
-// while the baseline is the size held when the host is healthy. HotplugSize is 0
-// on vz, so the max keeps non-vz backends correct if hotplug is ever populated.
+// while the baseline is the size held when the host is healthy. vz ignores
+// hotplug, so its baseline remains Size; other backends keep size+hotplug.
func balloonVMForInstance(inst instances.Instance) guestmemory.BalloonVM {
baseline := inst.Size + inst.HotplugSize
+ if inst.HypervisorType == hypervisor.TypeVZ {
+ baseline = inst.Size
+ }
assigned := baseline
if inst.MemoryCeilingBytes > assigned {
assigned = inst.MemoryCeilingBytes
diff --git a/lib/providers/providers_test.go b/lib/providers/providers_test.go
--- a/lib/providers/providers_test.go
+++ b/lib/providers/providers_test.go
@@ -4,6 +4,7 @@
"testing"
"github.com/kernel/hypeman/cmd/api/config"
+ "github.com/kernel/hypeman/lib/hypervisor"
"github.com/kernel/hypeman/lib/instances"
snapshotstore "github.com/kernel/hypeman/lib/snapshot"
"github.com/stretchr/testify/assert"
@@ -34,6 +35,13 @@
}})
assert.Equal(t, 2*gib, lowCeiling.AssignedMemoryBytes)
assert.Equal(t, 2*gib, lowCeiling.BaselineMemoryBytes)
+
+ // vz ignores hotplug sizing, so controller baseline must stay at Size.
+ vzWithHotplug := balloonVMForInstance(instances.Instance{StoredMetadata: instances.StoredMetadata{
+ Id: "d", HypervisorType: hypervisor.TypeVZ, Size: gib, HotplugSize: gib / 2, MemoryCeilingBytes: 3 * gib,
+ }})
+ assert.Equal(t, 3*gib, vzWithHotplug.AssignedMemoryBytes)
+ assert.Equal(t, gib, vzWithHotplug.BaselineMemoryBytes)
}
func TestSnapshotDefaultsFromConfigDisabledReturnsNilCompression(t *testing.T) {You can send follow-ups to the cloud agent here.
Two consistency gaps from baseline-anchored reclaim: - PlannedReclaimBytes/AppliedReclaimBytes still subtracted the target from AssignedMemoryBytes (the ceiling), so a ceiling VM idling at baseline reported phantom reclaim of ~ceiling-baseline. Anchor the reported reclaim on floorAnchorBytes too, clamped at zero. - balloonVMForInstance set the controller baseline to Size+HotplugSize, but the vz shim balloons to MemoryBytes (== Size) and ignores hotplug, so a vz ceiling VM with a hotplug size diverged. Mirror the shim: a ceiling VM baselines at Size and is capped at the ceiling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Small baseline below protected floor
- Protected floors are now capped at each VM’s floor anchor so small-baseline ceiling VMs are held at baseline instead of being inflated to the global minimum floor.
Or push these changes by commenting:
@cursor push 45bd39c88a
Preview (45bd39c88a)
diff --git a/lib/guestmemory/planner.go b/lib/guestmemory/planner.go
--- a/lib/guestmemory/planner.go
+++ b/lib/guestmemory/planner.go
@@ -89,7 +89,7 @@
func protectedFloorBytes(cfg ActiveBallooningConfig, anchor int64) int64 {
percentFloor := (anchor * int64(cfg.ProtectedFloorPercent)) / 100
- return maxInt64(cfg.ProtectedFloorMinBytes, percentFloor)
+ return minInt64(anchor, maxInt64(cfg.ProtectedFloorMinBytes, percentFloor))
}
// floorAnchorBytes is the size the protected floor is computed against: the
diff --git a/lib/guestmemory/planner_test.go b/lib/guestmemory/planner_test.go
--- a/lib/guestmemory/planner_test.go
+++ b/lib/guestmemory/planner_test.go
@@ -45,6 +45,22 @@
}
}
+func TestProtectedFloorBytesCappedAtAnchor(t *testing.T) {
+ const mib = int64(1024 * 1024)
+ const baseline = 256 * mib
+
+ cfg := ActiveBallooningConfig{
+ ProtectedFloorPercent: 50,
+ ProtectedFloorMinBytes: 512 * mib,
+ }
+
+ // The protected floor must never exceed the baseline anchor; otherwise a
+ // small-baseline ceiling VM is forced above baseline on healthy reconciles.
+ if got := protectedFloorBytes(cfg, baseline); got != baseline {
+ t.Fatalf("protected floor should cap at anchor %d, got %d", baseline, got)
+ }
+}
+
func TestAutomaticTargetBytesStressedHoldsCurrentReclaim(t *testing.T) {
const gib = int64(1024 * 1024 * 1024)
cfg := ActiveBallooningConfig{PressureLowWatermarkAvailablePercent: 15}You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit ede5689. Configure here.
The floor was capped at AssignedMemoryBytes (the ceiling), so a ceiling VM whose baseline (Size) is below protected_floor_min_bytes got a floor above its baseline — the healthy reconcile then raised the guest up to the floor instead of holding it at the baseline the shim set. Cap the floor at floorAnchorBytes (the baseline; == assigned for ordinary VMs, unchanged). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Created a monitoring plan for this PR. What this PR does: Ships the live memory ceiling infrastructure for macOS (vz) VMs — a new Intended effect:
Risks:
|


Summary
Implements live memory resize for macOS (vz) VMs via a boot ceiling. A vz VM can be created with a memory ceiling above its baseline
size; the shim boots the VM at the ceiling and immediately balloons it down to the baseline, and the existing host-pressure controller then moves the balloon target within[floor, ceiling]— growing toward the ceiling and shrinking under host pressure — with no reboot. vz only (Cloud Hypervisor / Firecracker have real hotplug and ignore it).The committed design doc (
docs/proposals/memory-hotplug-resize.md) is the contract; this PR implements milestones 1–2 plus the grow-on-demand plumbing.What's included
memory_ceiling(human-readable string, mirroringsize/hotplug_size) on the create-instance request and the Instance response (openapi.yaml+ regeneratedlib/oapi/oapi.go).MemoryCeilingBytesflowsCreateInstanceRequest→ stored instance →hypervisor.VMConfig→buildShimConfigFromVMConfig→ShimConfig, the same path the mergedPlatform/EnableRosettafields take, and persists through the snapshot manifest (no restore change).SupportsLiveMemoryCeiling(derived, internal).Normalizeclamps, validation, providers ceiling clamp, floor anchor, overflow split) + a darwin integration test (TestVZMemoryCeiling) that boots at the ceiling, asserts the guest sees ~ceilingMemTotal, balloons to baseline at low host RSS, and grows live to the ceiling without a reboot.Deferred (out of scope here)
Validation
test,test-darwin,e2e-install).TestVZMemoryCeilingwas verified to run and pass on the real macOS arm64 runner (a temporary CI step, since reverted).With no ceiling set, behavior is byte-identical to today.
Note
Medium Risk
Touches instance creation, vz boot/shim paths, and host memory pressure accounting; mis-accounting could affect multi-tenant reclaim, though defaults preserve prior behavior when no ceiling is set.
Overview
Adds live memory elasticity for macOS (vz) VMs via an optional
memory_ceilingon create (and on the Instance API): when ceiling > baselinesize, the guest can use RAM between baseline and ceiling without reboot, using boot-at-ceiling plus the virtio balloon instead of true hotplug.vz-shim boots at the ceiling when set, requires the balloon, rejects ceilings above the host RAM max, and on cold start balloons the guest down to baseline after
vm.Start. Create/validation threadsMemoryCeilingBytesthrough instance metadata andVMConfig, enforces vz-only and ceiling > size, and caps ceiling against per-instance memory limits.Active ballooning gains
BaselineMemoryBytesvs ceiling asAssignedMemoryBytes, reclaims from the baseline anchor (so idle ceiling VMs don’t steal reclaim from neighbors), healthy-host logic holds at baseline or preserves manual/API grows, optionalgrow_on_demand_*config (still inert until a guest utilization signal exists), and a 128-bit proportional reclaim split to avoid overflow.SupportsLiveMemoryCeilingis exposed on the vz client after start/restore.Docs include an RFC (
docs/proposals/memory-hotplug-resize.md), darwin example config, and unit plus darwin integration tests for ceiling boot, baseline settle, and live balloon grow.Reviewed by Cursor Bugbot for commit 74504d3. Bugbot is set up for automated code reviews on this repo. Configure here.