Skip to content

NE-2715: Update keepalived-ipfailover to replace iptables with nftables#235

Open
Thealisyed wants to merge 1 commit into
openshift:masterfrom
Thealisyed:NE-2715-ipfailover-nftables
Open

NE-2715: Update keepalived-ipfailover to replace iptables with nftables#235
Thealisyed wants to merge 1 commit into
openshift:masterfrom
Thealisyed:NE-2715-ipfailover-nftables

Conversation

@Thealisyed

@Thealisyed Thealisyed commented May 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Replace all iptables usage in keepalived-ipfailover with native nft commands
  • Remove iptables-scripts/ chroot wrappers (no longer needed since nft runs directly in the container)
  • Install nftables package in container image
  • Create a dedicated inet keepalived nft table for the VRRP multicast accept rule (224.0.0.18)
  • Preserve OPENSHIFT_HA_IPTABLES_CHAIN env var for backward compatibility

Why

iptables is deprecated in RHEL 9 and the ip_tables kernel module is removed in RHEL 10. The iptables-nft userspace shim remains but is deprecated. This migrates to native nft before the shim is removed entirely.

Summary by CodeRabbit

  • Refactor
    • Firewall management migrated from iptables to nftables for improved performance and long-term support.
    • Keepalived multicast handling updated to use nftables filtering and an inet table/chain for 224.0.0.18.
    • Removed legacy wrapper scripts no longer required by the modernized firewall approach.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 26, 2026
@openshift-ci-robot

openshift-ci-robot commented May 26, 2026

Copy link
Copy Markdown
Contributor

@Thealisyed: This pull request references NE-2715 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Replace all iptables usage in keepalived-ipfailover with native nft commands
  • Remove iptables-scripts/ chroot wrappers (no longer needed since nft runs directly in the container)
  • Install nftables package in container image
  • Create a dedicated inet keepalived nft table for the VRRP multicast accept rule (224.0.0.18)
  • Preserve OPENSHIFT_HA_IPTABLES_CHAIN env var for backward compatibility

Why

iptables is deprecated in RHEL 9 and the ip_tables kernel module is removed in RHEL 10. The iptables-nft userspace shim remains but is deprecated. This migrates to native nft before the shim is removed entirely.

What changed

File Change
ipfailover/keepalived/iptables-scripts/* Deleted — 6 chroot wrappers no longer needed
ipfailover/keepalived/Dockerfile Added nftables package, removed COPY lines for wrapper scripts
ipfailover/keepalived/conf/settings.sh Updated comments to explain nft behavior
ipfailover/keepalived/lib/failover-functions.sh Replaced iptables calls with nft equivalents

Test plan

  • Build container image and verify nft binary is available
  • Deploy ipfailover with OPENSHIFT_HA_IPTABLES_CHAIN=INPUT, verify nft list table inet keepalived shows multicast rule
  • Deploy ipfailover without OPENSHIFT_HA_IPTABLES_CHAIN, verify no nft table is created
  • Verify VIP failover works correctly between nodes
  • Test upgrade from iptables-based image to nft-based image without VIP interruption

Jira

https://redhat.atlassian.net/browse/NE-2715

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown

Walkthrough

This PR migrates the keepalived container from using iptables wrapper scripts to nftables for managing multicast firewall rules. The nftables package is installed, deprecated wrapper scripts are removed, configuration documentation is updated, and core failover setup/teardown logic switches to nftables table and rule management.

Changes

iptables to nftables migration for keepalived multicast

Layer / File(s) Summary
Container environment and package installation
ipfailover/keepalived/Dockerfile
Dockerfile adds nftables to the INSTALL_PKGS list for installation during the RUN step, enabling nft-based firewall rule management.
Configuration documentation update
ipfailover/keepalived/conf/settings.sh
Documentation for HA_IPTABLES_CHAIN is updated to describe that nft rules are created for keepalived multicast traffic (224.0.0.18) and that the chain name value is ignored because nft manages its own table and chain.
Remove deprecated iptables wrapper scripts
ipfailover/keepalived/iptables-scripts/ip6tables-restore
ip6tables-restore wrapper script that chrooted into /host and forwarded calls to ip6tables-restore is removed.
Migrate failover functions to nftables
ipfailover/keepalived/lib/failover-functions.sh
setup_failover checks for and creates the inet keepalived nft table with a filter input chain and an ip daddr 224.0.0.18 accept rule. unconfigure_failover deletes the inet keepalived nft table instead of removing iptables rules.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main objective of the changeset: replacing iptables with nftables in the keepalived-ipfailover component. It directly reflects the core changes across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The PR contains no Ginkgo tests. Go test files in the diff use standard testing.T, not Ginkgo. The ipfailover changes are shell scripts/configuration files, not Go tests.
Test Structure And Quality ✅ Passed PR contains no Ginkgo tests; includes only standard Go unit tests using testing.T, making this Ginkgo-specific check not applicable.
Microshift Test Compatibility ✅ Passed PR contains no Ginkgo e2e tests. Changes are only to Dockerfile, settings.sh, and failover-functions.sh (container configuration and shell scripts). Check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. The changes are container/infrastructure updates (Dockerfile, shell scripts) for keepalived-ipfailover to use nftables instead of iptables.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only container image build files and runtime scripts, not deployment manifests, operator code, or controllers. Topology-aware scheduling check does not apply.
Ote Binary Stdout Contract ✅ Passed This PR is not applicable to the OTE Binary Stdout Contract check. The PR only modifies shell scripts and Docker configuration for ipfailover/keepalived, not OTE test binaries.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests were added in this PR. All 16 changed files are infrastructure-related (Dockerfiles, shell scripts, documentation). Check is not applicable.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from frobware and knobunc May 26, 2026 12:17

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@README.md`:
- Line 14: The README.md contains a testing artifact line "//For my testing
purpose" that should not be published; remove that exact line from README.md (or
replace it with appropriate documentation text) so the README only contains
intended documentation content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 522cf7a2-4f21-4b3a-966a-31dc5b172233

📥 Commits

Reviewing files that changed from the base of the PR and between 4431812 and b3b419e.

📒 Files selected for processing (10)
  • README.md
  • ipfailover/keepalived/Dockerfile
  • ipfailover/keepalived/conf/settings.sh
  • ipfailover/keepalived/iptables-scripts/ip6tables
  • ipfailover/keepalived/iptables-scripts/ip6tables-restore
  • ipfailover/keepalived/iptables-scripts/ip6tables-save
  • ipfailover/keepalived/iptables-scripts/iptables
  • ipfailover/keepalived/iptables-scripts/iptables-restore
  • ipfailover/keepalived/iptables-scripts/iptables-save
  • ipfailover/keepalived/lib/failover-functions.sh
💤 Files with no reviewable changes (6)
  • ipfailover/keepalived/iptables-scripts/iptables
  • ipfailover/keepalived/iptables-scripts/iptables-restore
  • ipfailover/keepalived/iptables-scripts/ip6tables-restore
  • ipfailover/keepalived/iptables-scripts/iptables-save
  • ipfailover/keepalived/iptables-scripts/ip6tables
  • ipfailover/keepalived/iptables-scripts/ip6tables-save

Comment thread ipfailover/keepalived/lib/failover-functions.sh
Comment thread README.md Outdated
iptables is deprecated in RHEL 9 and the ip_tables kernel module is
removed in RHEL 10. This replaces all iptables usage in the ipfailover
component with native nft commands.

Changes:
- Remove iptables-scripts/ chroot wrappers (no longer needed since nft
  runs directly in the container)
- Install nftables package in Dockerfile
- Replace iptables calls in failover-functions.sh with nft equivalents
  that create a dedicated "inet keepalived" table for the multicast
  accept rule (224.0.0.18)
- Preserve OPENSHIFT_HA_IPTABLES_CHAIN env var for backward compat

Assisted with Claude
@Thealisyed Thealisyed force-pushed the NE-2715-ipfailover-nftables branch from b3b419e to bd18a29 Compare May 26, 2026 12:36
@Thealisyed

Copy link
Copy Markdown
Contributor Author

@coderabbitai can you do a quick review is all good?

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== README.md diff area ===" 
cat README.md | head -30

echo ""
echo "=== Dockerfile ==="
cat ipfailover/keepalived/Dockerfile

echo ""
echo "=== failover-functions.sh ==="
cat ipfailover/keepalived/lib/failover-functions.sh

echo ""
echo "=== settings.sh ==="
cat ipfailover/keepalived/conf/settings.sh

echo ""
echo "=== iptables-scripts dir (should be gone or empty) ==="
ls ipfailover/keepalived/iptables-scripts/ 2>/dev/null || echo "Directory removed or empty"

Length of output: 8651


🏁 Script executed:

#!/bin/bash
echo "=== Full README.md ==="
cat README.md

echo ""
echo "=== Check nft heredoc indentation (tabs vs spaces) ==="
cat -A ipfailover/keepalived/lib/failover-functions.sh | grep -A5 'NFT'

echo ""
echo "=== Check if nftables listed twice in Dockerfile ==="
grep -c 'nftables' ipfailover/keepalived/Dockerfile

echo ""
echo "=== Check any remaining references to iptables in shell scripts ==="
rg -n 'iptables' ipfailover/keepalived/ --type sh

Length of output: 1675



@Thealisyed — here's a quick review. Overall the migration looks solid and clean. A few observations:


✅ What looks good

  • iptables-scripts/ fully removed, no stray references left in any shell script.
  • Dockerfile: nftables added once, no duplication.
  • Heredoc indentation: Uses tabs throughout — correct for <<- (tabs are stripped as expected).
  • Backward compatibility: OPENSHIFT_HA_IPTABLES_CHAIN env var preserved; comments in settings.sh updated clearly.
  • unconfigure_failover: Checks for table existence before deleting — defensive and correct.

⚠️ Minor concerns

  1. Fragile idempotency check in setup_failover

    nft list chain inet keepalived filter 2>/dev/null | grep -q 'ip daddr 224.0.0.18 accept'

    The grep pattern must exactly match nft's output format. Depending on the nftables version, the output could include extra spacing, resolved hostnames, or hex representations of the address — causing a false negative and attempting to re-add the table. Consider using a more robust check:

    if ! nft list table inet keepalived > /dev/null 2>&1; then

    Since you control the entire table, checking for table existence is sufficient and more reliable than grepping rule text.

  2. No error handling on nft commands
    If nft delete table inet keepalived or nft -f - fails (e.g. permission error, kernel module issue), the failure is silently swallowed. Adding a || echo "WARNING: nft command failed" would make failures visible in pod logs.

  3. README.md test comment
    The AI-generated diff summary flagged a //For my testing purpose comment added to README.md from a "testing commit." The current branch HEAD shows the file is clean, so it appears this was either already dropped or the summary was stale — but worth double-checking your commit history (git log --oneline) to make sure no leftover test artifacts remain.


Summary

The core migration (iptables → nft, removal of chroot wrappers, dedicated inet keepalived table) is implemented correctly. The two items above are robustness/observability improvements rather than blockers. The test plan in the PR description covers the critical paths well. Once verified per your test plan, this looks ready to go.

✅ Actions performed

Full review triggered.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ipfailover/keepalived/lib/failover-functions.sh`:
- Around line 62-75: The heredoc invocation of nft (-f -) inside the
HA_IPTABLES_CHAIN block may fail silently; update the block in
failover-functions.sh to capture the exit status of the nft -f - operation
(after invoking nft with the provided heredoc for table/chain creation), and if
it fails (non-zero exit code) log a clear error including the attempted rule
("ip daddr 224.0.0.18 accept" / "nft -f -") via echo or process logger and exit
non-zero (or return failure) so setup does not continue without the multicast
accept rule; also consider checking nft availability before attempting the
heredoc (e.g., via command -v nft) to produce a more specific error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7499d2f7-b512-4081-bb8b-b9f59783b1d8

📥 Commits

Reviewing files that changed from the base of the PR and between b3b419e and bd18a29.

📒 Files selected for processing (9)
  • ipfailover/keepalived/Dockerfile
  • ipfailover/keepalived/conf/settings.sh
  • ipfailover/keepalived/iptables-scripts/ip6tables
  • ipfailover/keepalived/iptables-scripts/ip6tables-restore
  • ipfailover/keepalived/iptables-scripts/ip6tables-save
  • ipfailover/keepalived/iptables-scripts/iptables
  • ipfailover/keepalived/iptables-scripts/iptables-restore
  • ipfailover/keepalived/iptables-scripts/iptables-save
  • ipfailover/keepalived/lib/failover-functions.sh
💤 Files with no reviewable changes (6)
  • ipfailover/keepalived/iptables-scripts/ip6tables-save
  • ipfailover/keepalived/iptables-scripts/ip6tables-restore
  • ipfailover/keepalived/iptables-scripts/iptables-restore
  • ipfailover/keepalived/iptables-scripts/iptables-save
  • ipfailover/keepalived/iptables-scripts/iptables
  • ipfailover/keepalived/iptables-scripts/ip6tables
✅ Files skipped from review due to trivial changes (1)
  • ipfailover/keepalived/conf/settings.sh

Comment thread ipfailover/keepalived/lib/failover-functions.sh

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
ipfailover/keepalived/lib/failover-functions.sh (1)

62-74: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make nft setup idempotent for partial state and fail fast when nft apply fails.

If inet keepalived already exists but is incomplete, nft -f - can fail on table redefinition and setup still proceeds without guaranteeing ip daddr 224.0.0.18 accept.

Suggested fix
   if [[ -n "${HA_IPTABLES_CHAIN:-}" ]]; then
     echo "  - Ensuring nft rule for keepalived multicast (224.0.0.18) ..."
-    if ! nft list chain inet keepalived filter 2>/dev/null | grep -q 'ip daddr 224.0.0.18 accept' ; then
-      echo "  - Adding nft rule to accept multicast 224.0.0.18."
-      nft -f - <<-'NFT'
-	table inet keepalived {
-	  chain filter {
-	    type filter hook input priority 0; policy accept;
-	    ip daddr 224.0.0.18 accept
-	  }
-	}
-	NFT
-    fi
+    nft list table inet keepalived >/dev/null 2>&1 || \
+      nft add table inet keepalived || { echo "ERROR: failed to create nft table inet keepalived"; return 1; }
+
+    nft list chain inet keepalived filter >/dev/null 2>&1 || \
+      nft add chain inet keepalived filter '{ type filter hook input priority 0; policy accept; }' || { echo "ERROR: failed to create nft chain inet keepalived filter"; return 1; }
+
+    if ! nft list chain inet keepalived filter 2>/dev/null | grep -q 'ip daddr 224.0.0.18 accept' ; then
+      echo "  - Adding nft rule to accept multicast 224.0.0.18."
+      nft add rule inet keepalived filter ip daddr 224.0.0.18 accept || { echo "ERROR: failed to add nft rule ip daddr 224.0.0.18 accept"; return 1; }
+    fi
   fi
#!/bin/bash
set -euo pipefail

file="$(fd --full-path 'ipfailover/keepalived/lib/failover-functions.sh$' | head -n1)"
python - <<'PY'
import re, pathlib
p = pathlib.Path("ipfailover/keepalived/lib/failover-functions.sh")
t = p.read_text()
m = re.search(r'if \[\[ -n "\$\{HA_IPTABLES_CHAIN:-\}" \]\]; then(.*?)\n\s*fi', t, re.S)
block = m.group(1) if m else ""
print("has_heredoc_apply:", "nft -f -" in block)
print("has_table_guard:", "nft list table inet keepalived" in block)
print("has_failfast_return:", "return 1" in block)
PY

Expected confirmation for current code: heredoc apply is present, while explicit table/chain guards and fail-fast return in this block are absent.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ipfailover/keepalived/lib/failover-functions.sh` around lines 62 - 74, The
current HA_IPTABLES_CHAIN block uses a heredoc with "nft -f -" without guarding
for partial existing state and doesn't fail fast; update the block to first
probe the existing state with "nft list table inet keepalived" and/or "nft list
chain inet keepalived filter" and only run the heredoc to create missing pieces
(or run targeted "nft add table"/"nft add chain"/"nft add rule" commands) when
those probes show resources are absent, and capture the exit status of the nft
apply (the heredoc or the targeted commands) and immediately return 1 (or exit
non-zero) on failure so the caller sees the error; keep these checks and the
fail-fast return inside the same HA_IPTABLES_CHAIN conditional so the behavior
is idempotent and errors are propagated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@ipfailover/keepalived/lib/failover-functions.sh`:
- Around line 62-74: The current HA_IPTABLES_CHAIN block uses a heredoc with
"nft -f -" without guarding for partial existing state and doesn't fail fast;
update the block to first probe the existing state with "nft list table inet
keepalived" and/or "nft list chain inet keepalived filter" and only run the
heredoc to create missing pieces (or run targeted "nft add table"/"nft add
chain"/"nft add rule" commands) when those probes show resources are absent, and
capture the exit status of the nft apply (the heredoc or the targeted commands)
and immediately return 1 (or exit non-zero) on failure so the caller sees the
error; keep these checks and the fail-fast return inside the same
HA_IPTABLES_CHAIN conditional so the behavior is idempotent and errors are
propagated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 33de2090-fe7b-49e1-a8ea-9918ab0cf1d6

📥 Commits

Reviewing files that changed from the base of the PR and between 4431812 and bd18a29.

📒 Files selected for processing (9)
  • ipfailover/keepalived/Dockerfile
  • ipfailover/keepalived/conf/settings.sh
  • ipfailover/keepalived/iptables-scripts/ip6tables
  • ipfailover/keepalived/iptables-scripts/ip6tables-restore
  • ipfailover/keepalived/iptables-scripts/ip6tables-save
  • ipfailover/keepalived/iptables-scripts/iptables
  • ipfailover/keepalived/iptables-scripts/iptables-restore
  • ipfailover/keepalived/iptables-scripts/iptables-save
  • ipfailover/keepalived/lib/failover-functions.sh
💤 Files with no reviewable changes (6)
  • ipfailover/keepalived/iptables-scripts/iptables-save
  • ipfailover/keepalived/iptables-scripts/ip6tables
  • ipfailover/keepalived/iptables-scripts/ip6tables-restore
  • ipfailover/keepalived/iptables-scripts/iptables-restore
  • ipfailover/keepalived/iptables-scripts/ip6tables-save
  • ipfailover/keepalived/iptables-scripts/iptables

@Thealisyed

Copy link
Copy Markdown
Contributor Author

/retest

@openshift-ci

openshift-ci Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

@Thealisyed: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@rikatz

rikatz commented Jun 5, 2026

Copy link
Copy Markdown
Member

/assign

if iptables -C ${chain} 1 -d 224.0.0.18/32 -j ACCEPT ; then
echo " - Removing keepalived multicast iptables rule ..."
iptables -D ${chain} 1 -d 224.0.0.18/32 -j ACCEPT
if [[ -n "${HA_IPTABLES_CHAIN:-}" ]]; then

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hum, this change is weird. I understand we don't rely on the chain name anymore, but in that case what is passed from the caller (settings.sh) to this function, and what it will receive? Can this be used? Should the variable name be something else?

Just some thinking if this will properly work

@Thealisyed Thealisyed Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

settings.sh passes a chain name like "INPUT" via OPENSHIFT_HA_IPTABLES_CHAIN. The old code used it directly but nft doesn't insert into existing chains, it creates its own isolated table (inet keepalived hooked via type/hook/priority).

So the chain name value is architecturally irrelevant to nft, only non emptiness matters. The variable name is kept for backward compatibility with existing deployments.

What do you think?

Comment thread ipfailover/keepalived/lib/failover-functions.sh
iptables -I ${chain} 1 -d 224.0.0.18/32 -j ACCEPT
if [[ -n "${HA_IPTABLES_CHAIN:-}" ]]; then
echo " - Ensuring nft rule for keepalived multicast (224.0.0.18) ..."
if ! nft list chain inet keepalived filter 2>/dev/null | grep -q 'ip daddr 224.0.0.18 accept' ; then

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if this command errors?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same behavior as the old code iptables -C / iptables -I had no error handling either. The chain policy is accept, so the multicast rule is defense in depth, not load-bearing.
Adding here would make setup fatal where it was previously best effort, which would be a behavior change?

@rikatz

rikatz commented Jun 9, 2026

Copy link
Copy Markdown
Member

/lgtm
/approve
Thanks. Changes looks good

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 9, 2026
@openshift-ci

openshift-ci Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rikatz, Thealisyed
Once this PR has been reviewed and has the lgtm label, please assign candita for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@melvinjoseph86

Copy link
Copy Markdown

Verified using cluster bot create OSP cluster. No issue with the existing test case.

➜  oc get clusterversion
NAME      VERSION                                               AVAILABLE   PROGRESSING   SINCE   STATUS
version   5.0.0-0-2026-06-17-065633-test-ci-ln-4kmwqrb-latest   True        False         127m    Cluster version is 5.0.0-0-2026-06-17-065633-test-ci-ln-4kmwqrb-latest
➜   Oc get infrastructure cluster -o=jsonpath={.spec.platformSpec.type}                                                                                   
OpenStack%  
➜   ./bin/extended-platform-tests run all --dry-run | grep -E  "41025|41027|41028|41029|41030|49214"| ./bin/extended-platform-tests run --junit-dir=./ -f-
  I0617 14:44:45.700355 59255 test.go:180] Found authentication type used: 
  I0617 14:44:45.700747 59255 test_context.go:566] The --provider flag is not set. Continuing as if --provider=skeleton had been used.
  I0617 14:44:48.039855 59255 api.go:57] EnvIsKubernetesCluster = no, start monitoring ClusterOperators and ClusterVersions
started: (0/1/6) "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41029-ipfailover can support up to a maximum of 255 VIPs for the entire cluster [Serial]"

passed: (1m53s) 2026-06-17T09:16:41 "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41029-ipfailover can support up to a maximum of 255 VIPs for the entire cluster [Serial]"

started: (0/2/6) "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-49214-Excluding the existing VRRP cluster ID from ipfailover deployments [Serial]"

passed: (1m52s) 2026-06-17T09:18:33 "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-49214-Excluding the existing VRRP cluster ID from ipfailover deployments [Serial]"

started: (0/3/6) "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41027-pod and service automatically switched over to standby when master fails [Disruptive] [Serial]"

passed: (2m22s) 2026-06-17T09:20:54 "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41027-pod and service automatically switched over to standby when master fails [Disruptive] [Serial]"

started: (0/4/6) "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41028-ipfailover configuration can be customized by ENV [Serial]"

passed: (3m42s) 2026-06-17T09:24:36 "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41028-ipfailover configuration can be customized by ENV [Serial]"

started: (0/5/6) "[sig-network-edge] Network_Edge Author:hongli-NonHyperShiftHOST-ConnectedOnly-Critical-41025-support to deploy ipfailover [Serial]"

passed: (1m4s) 2026-06-17T09:25:40 "[sig-network-edge] Network_Edge Author:hongli-NonHyperShiftHOST-ConnectedOnly-Critical-41025-support to deploy ipfailover [Serial]"

started: (0/6/6) "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-High-41030-preemption strategy for keepalived ipfailover [Disruptive] [Serial]"

passed: (5m6s) 2026-06-17T09:30:47 "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-High-41030-preemption strategy for keepalived ipfailover [Disruptive] [Serial]"

Writing JUnit report to junit_e2e_20260617-093047.xml

6 pass, 0 skip (15m59s)

/verified by @mjoseph

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jun 17, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@melvinjoseph86: This PR has been marked as verified by @mjoseph.

Details

In response to this:

Verified using cluster bot create OSP cluster. No issue with the existing test case.

➜  oc get clusterversion
NAME      VERSION                                               AVAILABLE   PROGRESSING   SINCE   STATUS
version   5.0.0-0-2026-06-17-065633-test-ci-ln-4kmwqrb-latest   True        False         127m    Cluster version is 5.0.0-0-2026-06-17-065633-test-ci-ln-4kmwqrb-latest
➜   Oc get infrastructure cluster -o=jsonpath={.spec.platformSpec.type}                                                                                   
OpenStack%  
➜   ./bin/extended-platform-tests run all --dry-run | grep -E  "41025|41027|41028|41029|41030|49214"| ./bin/extended-platform-tests run --junit-dir=./ -f-
 I0617 14:44:45.700355 59255 test.go:180] Found authentication type used: 
 I0617 14:44:45.700747 59255 test_context.go:566] The --provider flag is not set. Continuing as if --provider=skeleton had been used.
 I0617 14:44:48.039855 59255 api.go:57] EnvIsKubernetesCluster = no, start monitoring ClusterOperators and ClusterVersions
started: (0/1/6) "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41029-ipfailover can support up to a maximum of 255 VIPs for the entire cluster [Serial]"

passed: (1m53s) 2026-06-17T09:16:41 "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41029-ipfailover can support up to a maximum of 255 VIPs for the entire cluster [Serial]"

started: (0/2/6) "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-49214-Excluding the existing VRRP cluster ID from ipfailover deployments [Serial]"

passed: (1m52s) 2026-06-17T09:18:33 "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-49214-Excluding the existing VRRP cluster ID from ipfailover deployments [Serial]"

started: (0/3/6) "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41027-pod and service automatically switched over to standby when master fails [Disruptive] [Serial]"

passed: (2m22s) 2026-06-17T09:20:54 "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41027-pod and service automatically switched over to standby when master fails [Disruptive] [Serial]"

started: (0/4/6) "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41028-ipfailover configuration can be customized by ENV [Serial]"

passed: (3m42s) 2026-06-17T09:24:36 "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-Medium-41028-ipfailover configuration can be customized by ENV [Serial]"

started: (0/5/6) "[sig-network-edge] Network_Edge Author:hongli-NonHyperShiftHOST-ConnectedOnly-Critical-41025-support to deploy ipfailover [Serial]"

passed: (1m4s) 2026-06-17T09:25:40 "[sig-network-edge] Network_Edge Author:hongli-NonHyperShiftHOST-ConnectedOnly-Critical-41025-support to deploy ipfailover [Serial]"

started: (0/6/6) "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-High-41030-preemption strategy for keepalived ipfailover [Disruptive] [Serial]"

passed: (5m6s) 2026-06-17T09:30:47 "[sig-network-edge] Network_Edge Author:mjoseph-NonHyperShiftHOST-ConnectedOnly-High-41030-preemption strategy for keepalived ipfailover [Disruptive] [Serial]"

Writing JUnit report to junit_e2e_20260617-093047.xml

6 pass, 0 skip (15m59s)

/verified by @mjoseph

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

FROM registry.ci.openshift.org/ocp/4.18:base-rhel9

RUN INSTALL_PKGS="kmod keepalived iproute psmisc nmap-ncat net-tools ipset ipset-libs procps-ng" && \
RUN INSTALL_PKGS="kmod keepalived iproute psmisc nmap-ncat net-tools ipset ipset-libs procps-ng nftables" && \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth documenting why we didn't want to add iptables pkg to the Dockerfile, but it's okay to add nftables pkg.

I read that "standard containers lack the kernel authority to alter routing and firewall namespaces. You must grant the container explicit networking capabilities when launching iptables", so if that is why we did the iptables scripts instead of instaling the pkg, we may need to do the same for nftables.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code didn't ship the binary for iptables. Instead it used chroot wrapper scripts to borrow the host's iptables binary (here)

We can install nftables directly in the container because it communicates with the kernel via netlink sockets therefore the binary version doesn't need to match the host kernel. The old chroot pattern existed because iptables had tighter coupling between the userspace binary and the host's kernel module.

No new privileges are needed as the container already runs with --privileged=true --net=host which covers what nft needs.

@candita candita Jun 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document your tests that show this works on both rhel9 and rhel10. Also, if you could document that this works even when the binaries don't match, that would be definitive.

# created to allow keepalived multicast (224.0.0.18) traffic. The actual
# chain name value is ignored; nft manages its own table/chain.
# Env var name kept for backward compatibility.
HA_IPTABLES_CHAIN="${OPENSHIFT_HA_IPTABLES_CHAIN:-""}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using both an old and new env var name, so we can eventually deprecate the old one. Definitely rename our local variable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll add a new env var OPENSHIFT_HA_NFTABLES_RULE (or do you prefer a different name?) and fall back to the old one, something like:

HA_MULTICAST_ACCEPT="${OPENSHIFT_HA_NFTABLES_RULE:-${OPENSHIFT_HA_IPTABLES_CHAIN:-""}}"

This way the old env var still works, new deployments can use the new name and the local variable reflects what it actually controls.

iptables -D ${chain} 1 -d 224.0.0.18/32 -j ACCEPT
if [[ -n "${HA_IPTABLES_CHAIN:-}" ]]; then
if nft list table inet keepalived > /dev/null 2>&1 ; then
echo " - Removing keepalived multicast nft rules ..."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
echo " - Removing keepalived multicast nft rules ..."
echo " - Removing keepalived multicast nftables rules ..."

if [[ -n "${HA_IPTABLES_CHAIN:-}" ]]; then
if nft list table inet keepalived > /dev/null 2>&1 ; then
echo " - Removing keepalived multicast nft rules ..."
nft delete table inet keepalived

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were just removing a rule before, but now we delete the whole table?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes
With iptables we were removing a single rule from a shared chain that other things also used, so we had to delete just our rule. With nft, we own the entire inet keepalived table and nothing else uses it.
Deleting the whole table is the clean equivalent and safer since it leaves no rules behind.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it takes some time to recreate the table, and that could introduce race conditions and possible transient security issues.

# Add the rule to the beginning of the chain.
echo " - adding iptables rule to $chain to access 224.0.0.18."
iptables -I ${chain} 1 -d 224.0.0.18/32 -j ACCEPT
if [[ -n "${HA_IPTABLES_CHAIN:-}" ]]; then

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments to explain what's going on. The more the better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants