feat: implement burstable strategy for mem and cpu by mircea-pavel-anton · Pull Request #531 · robusta-dev/krr

mircea-pavel-anton · 2026-06-12T18:58:45Z

The purpose of this PR is to add a basic "burstable" recommendation strategy.

As far as I can tell, the current strategies enforce a QoS recommendation (limit == request) for memory, while allowing CPU to be burstable.

While this makes sense in certain scenarios, I'm also running KRR in my homelab to propose resource adjustments. In such cases, apps are most of the time idling, with occasional usage spikes. Setting requests and limits to the same value as to accommodate that spike would result In massive over provisioning in this situation.

Hence I'm working on getting this strategy going that can recommend individual values for requests and limits based on percentiles, just like CPU recommendations work in the other strategies.

For reference, I'm running my fork of KRR (the code in this PR) like so: https://github.com/mirceanton/home-ops/tree/main/apps%2Fjobs%2Fkrr%2Fapp

To generate this "dashboard": mirceanton/home-ops#559

Signed-off-by: Mircea-Pavel ANTON <contact@mirceanton.com>

CLAassistant · 2026-06-12T18:58:55Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

coderabbitai · 2026-06-12T18:58:59Z

Walkthrough

This PR introduces a new BurstableStrategy for Kubernetes resource recommendation based on percentile Prometheus metrics. It adds a PercentileMemoryLoader metric factory, a configurable strategy with settings for percentiles and buffers, CPU/memory proposal logic with HPA awareness and optional OOMKill memory bumping, and comprehensive tests.

Changes

BurstableStrategy with Percentile Metrics and OOMKill Integration

Layer / File(s)	Summary
Percentile Memory Metric Loader `robusta_krr/core/integrations/prometheus/metrics/memory.py`, `robusta_krr/core/integrations/prometheus/metrics/__init__.py`	`PercentileMemoryLoader(percentile)` factory validates percentile bounds and returns a `PrometheusMetric` subclass generating PromQL `quantile_over_time` expressions over `container_memory_working_set_bytes`.
BurstableStrategy Settings and Module Setup `robusta_krr/strategies/__init__.py`, `robusta_krr/strategies/burstable.py` (module helpers and settings)	`BurstableStrategySettings` defines configurable percentiles, buffers, history thresholds, and HPA/OOMKill options; `_named_loader` helper wraps metric factories with class naming; strategy is exported at package level.
BurstableStrategy Core Logic `robusta_krr/strategies/burstable.py` (strategy class and proposal methods)	`BurstableStrategy` computes CPU request from max percentile request (optionally omitting limit if disabled) and memory request from max percentile request; memory limit is the max of percentile-limit-plus-buffer and optional OOMKill-derived bumped value; both proposals validate data presence, enforce `points_required`, return undefined when HPA is detected (unless `allow_hpa` enabled), and handle NaN when data is insufficient.
BurstableStrategy Tests `tests/test_burstable.py`	CPU tests verify request/limit separation, disabled limit handling, and NaN on missing/insufficient data; memory tests verify buffering behavior, OOMKill limit bumping (including when ignored), and HPA-driven undefined behavior; CLI smoke tests validate command and settings structure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.45% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	No pull request description was provided by the author, making it impossible to assess whether it relates to the changeset.	Add a pull request description explaining the purpose, implementation details, and testing approach for the burstable strategy feature.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: implementing a burstable strategy for memory and CPU resource recommendations.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

tests/test_burstable.py (1)

132-199: ⚡ Quick win

Add regression tests for empty limit loaders to prevent np.max([]) crashes.

Current tests don’t cover the case where request loader has data but limit loader is empty. Add one CPU and one memory test to ensure run() returns undefined rather than raising.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_burstable.py` around lines 132 - 199, Add two regression tests to
tests/test_burstable.py that cover the case where the request loader has data
but the corresponding limit loader is empty so run() does not call np.max([])
and instead returns undefined; specifically, add one test for CPU and one for
Memory using make_strategy, make_history_data (populate request values but set
e.g. "BurstableCpuLimitLoader" or "BurstableMemoryLimitLoader" to {}), call
strategy.run(history, make_object_data()), and assert that
result[ResourceType.Cpu].limit (and result[ResourceType.Memory].limit) is NaN
(use math.isnan) or that request is NaN as appropriate to match other tests;
reference the existing helpers make_strategy, make_history_data,
make_object_data, ResourceType, and strategy.run so the new tests follow the
same patterns as the surrounding tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@robusta_krr/core/integrations/prometheus/metrics/memory.py`:
- Line 88: The code is rounding the quantile value before emitting PromQL:
replace the use of round(percentile / 100, 2) with the precise percentile/100
value (or format it as a full-precision float string) so the emitted PromQL uses
the exact quantile; locate the place constructing the query (the expression
containing {round(percentile / 100, 2)}) where the variable percentile is used
and remove the rounding so quantiles like 95.5 and 99.9 become 0.955 and 0.999
in the emitted query.
- Line 83: The parameter name `object` in the method get_query(self, object:
K8sObjectData, duration: str, step: str) shadows a Python builtin and triggers
Ruff A002; rename the parameter (e.g., to `obj` or `k8s_obj`) in the get_query
signature and update all uses inside the method (and any callers in this
module/class) to the new name, keeping the type K8sObjectData unchanged so
behavior is preserved.

In `@robusta_krr/strategies/burstable.py`:
- Around line 137-141: The cpu and memory limit computations call np.max on
history_data["BurstableCPULimitLoader"] / "BurstableMemoryLimitLoader" values
without checking for emptiness; modify the cpu_limit and memory_limit
expressions (the code that computes cpu_limit and memory_limit in burstable.py)
to first check that the respective loader dict exists and has at least one
non-empty value (e.g., len(history_data.get("BurstableCPULimitLoader", {})) > 0
and that any(v.size for v in values) or similar) and only then call np.max,
otherwise set the limit to None (respecting disable_cpu_limit /
disable_memory_limit). Use the existing loader keys ("BurstableCPULimitLoader",
"BurstableMemoryLimitLoader") and history_data variable names so the fix is
local and avoids ValueError from np.max on empty sequences.

---

Nitpick comments:
In `@tests/test_burstable.py`:
- Around line 132-199: Add two regression tests to tests/test_burstable.py that
cover the case where the request loader has data but the corresponding limit
loader is empty so run() does not call np.max([]) and instead returns undefined;
specifically, add one test for CPU and one for Memory using make_strategy,
make_history_data (populate request values but set e.g.
"BurstableCpuLimitLoader" or "BurstableMemoryLimitLoader" to {}), call
strategy.run(history, make_object_data()), and assert that
result[ResourceType.Cpu].limit (and result[ResourceType.Memory].limit) is NaN
(use math.isnan) or that request is NaN as appropriate to match other tests;
reference the existing helpers make_strategy, make_history_data,
make_object_data, ResourceType, and strategy.run so the new tests follow the
same patterns as the surrounding tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2c60d03b-e80d-493a-95bf-a2070e82ab16

📥 Commits

Reviewing files that changed from the base of the PR and between 383275a and a9e2b4b.

📒 Files selected for processing (5)

robusta_krr/core/integrations/prometheus/metrics/__init__.py
robusta_krr/core/integrations/prometheus/metrics/memory.py
robusta_krr/strategies/__init__.py
robusta_krr/strategies/burstable.py
tests/test_burstable.py

coderabbitai · 2026-06-12T19:02:51Z

+        raise ValueError("percentile must be between 0 and 100")
+
+    class PercentileMemoryLoader(PrometheusMetric):
+        def get_query(self, object: K8sObjectData, duration: str, step: str) -> str:


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Rename the object parameter to avoid lint failure (Ruff A002).

Line 83 introduces a builtin-shadowing argument name (object), which is flagged by Ruff and may break CI lint gates.

💡 Suggested fix

- def get_query(self, object: K8sObjectData, duration: str, step: str) -> str: - pods_selector = "|".join(pod.name for pod in object.pods) + def get_query(self, object_data: K8sObjectData, duration: str, step: str) -> str: + pods_selector = "|".join(pod.name for pod in object_data.pods) cluster_label = self.get_prometheus_cluster_label() return f""" quantile_over_time( {percentile / 100}, max( container_memory_working_set_bytes{{ - namespace="{object.namespace}", + namespace="{object_data.namespace}", pod=~"{pods_selector}", - container="{object.container}" + container="{object_data.container}" {cluster_label} }}

🧰 Tools

🪛 Ruff (0.15.15)

[error] 83-83: Function argument object is shadowing a Python builtin

(A002)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@robusta_krr/core/integrations/prometheus/metrics/memory.py` at line 83, The parameter name `object` in the method get_query(self, object: K8sObjectData, duration: str, step: str) shadows a Python builtin and triggers Ruff A002; rename the parameter (e.g., to `obj` or `k8s_obj`) in the get_query signature and update all uses inside the method (and any callers in this module/class) to the new name, keeping the type K8sObjectData unchanged so behavior is preserved.

Source: Linters/SAST tools

coderabbitai · 2026-06-12T19:02:51Z

+            cluster_label = self.get_prometheus_cluster_label()
+            return f"""
+                quantile_over_time(
+                    {round(percentile / 100, 2)},


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not round the quantile value before emitting PromQL.

Line 88 rounds percentile / 100 to 2 decimals, which silently shifts requested percentiles (e.g., 95.5 → 0.95, 99.9 → 1.0). That changes recommendation behavior.

💡 Suggested fix

- {round(percentile / 100, 2)}, + {percentile / 100},

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{round(percentile / 100, 2)},

{percentile / 100},

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@robusta_krr/core/integrations/prometheus/metrics/memory.py` at line 88, The code is rounding the quantile value before emitting PromQL: replace the use of round(percentile / 100, 2) with the precise percentile/100 value (or format it as a full-precision float string) so the emitted PromQL uses the exact quantile; locate the place constructing the query (the expression containing {round(percentile / 100, 2)}) where the variable percentile is used and remove the rounding so quantiles like 95.5 and 99.9 become 0.955 and 0.999 in the emitted query.

coderabbitai · 2026-06-12T19:02:52Z

+        cpu_limit = (
+            None
+            if self.settings.disable_cpu_limit
+            else np.max([v[0, 1] for v in history_data["BurstableCPULimitLoader"].values()])
+        )


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard limit-series loaders before np.max to avoid runtime exceptions.

In both CPU (Lines 137-141) and memory (Lines 178-182), np.max is called on loader values without checking that the loader dict is non-empty. If the limit series is missing/sparse while request series exists, this throws ValueError and fails the strategy run.

💡 Suggested fix

cpu_request = np.max([v[0, 1] for v in data_req.values()]) + data_limit = history_data["BurstableCPULimitLoader"] + if not self.settings.disable_cpu_limit and len(data_limit) == 0: + return ResourceRecommendation.undefined(info="No data") cpu_limit = ( None if self.settings.disable_cpu_limit - else np.max([v[0, 1] for v in history_data["BurstableCPULimitLoader"].values()]) + else np.max([v[0, 1] for v in data_limit.values()]) ) @@ memory_request = np.max([v[0, 1] for v in data_req.values()]) - memory_limit_base = np.max([v[0, 1] for v in history_data["BurstableMemoryLimitLoader"].values()]) + data_limit = history_data["BurstableMemoryLimitLoader"] + if len(data_limit) == 0: + return ResourceRecommendation.undefined(info="No data") + memory_limit_base = np.max([v[0, 1] for v in data_limit.values()])

Also applies to: 178-182

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@robusta_krr/strategies/burstable.py` around lines 137 - 141, The cpu and memory limit computations call np.max on history_data["BurstableCPULimitLoader"] / "BurstableMemoryLimitLoader" values without checking for emptiness; modify the cpu_limit and memory_limit expressions (the code that computes cpu_limit and memory_limit in burstable.py) to first check that the respective loader dict exists and has at least one non-empty value (e.g., len(history_data.get("BurstableCPULimitLoader", {})) > 0 and that any(v.size for v in values) or similar) and only then call np.max, otherwise set the limit to None (respecting disable_cpu_limit / disable_memory_limit). Use the existing loader keys ("BurstableCPULimitLoader", "BurstableMemoryLimitLoader") and history_data variable names so the fix is local and avoids ValueError from np.max on empty sequences.

feat: implement burstable strategy for mem and cpu

a9e2b4b

Signed-off-by: Mircea-Pavel ANTON <contact@mirceanton.com>

coderabbitai Bot reviewed Jun 12, 2026

View reviewed changes

mircea-pavel-anton marked this pull request as draft June 12, 2026 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement burstable strategy for mem and cpu#531

feat: implement burstable strategy for mem and cpu#531
mircea-pavel-anton wants to merge 1 commit into
robusta-dev:mainfrom
mircea-pavel-anton:feat/burstable-strategy

mircea-pavel-anton commented Jun 12, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

coderabbitai Bot Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mircea-pavel-anton commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mircea-pavel-anton commented Jun 12, 2026 •

edited

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading