feat(rcv1p): unify cert bootstrap flow and add Windows CA refresh task#8096
feat(rcv1p): unify cert bootstrap flow and add Windows CA refresh task#8096rchincha wants to merge 120 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to unify the custom-cloud CA certificate bootstrap path (removing the separate “operation-requests” init scripts) and adds a Windows scheduled task to periodically refresh custom-cloud CA certificates.
Changes:
- Windows: add a scheduled task to refresh custom-cloud CA certificates; update
Get-CACertificatesto support legacy vs “rcv1p” modes keyed off location. - Linux: consolidate custom-cloud init to a single init script and update CSE command generation to set a cert-endpoint mode variable.
- Regenerate multiple custom data / generated command snapshots to reflect the new templates.
Reviewed changes
Copilot reviewed 74 out of 176 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| staging/cse/windows/kubernetesfunc.ps1 | Adds CA refresh scheduled task + updates CA retrieval logic and error behavior |
| parts/windows/kuberneteswindowssetup.ps1 | Wires Get-CACertificates -Location and registers refresh task for custom clouds |
| pkg/agent/variables.go | Always injects initAKSCustomCloud payload into cloud-init data |
| pkg/agent/const.go | Removes separate custom-cloud init script constants; keeps single init script |
| pkg/agent/baker.go | Simplifies GetTargetEnvironment; notes IsAKSCustomCloud as deprecated |
| parts/linux/cloud-init/artifacts/cse_cmd.sh | Updates CSE command to set cert endpoint mode + run custom-cloud init script |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.sh | Deleted (custom-cloud init consolidation) |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.sh | Deleted (custom-cloud init consolidation) |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.sh | Deleted (custom-cloud init consolidation) |
| aks-node-controller/parser/templates/cse_cmd.sh.gtpl | Mirrors CSE command template updates for aks-node-controller parser |
| aks-node-controller/parser/testdata/Compatibility+EmptyConfig/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AzureLinuxv2+Kata+DisableUnattendedUpgrades=false/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+SSHStatusOn/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+EnablePubkeyAuth/generatedCSECommand | New snapshot for new template output |
| aks-node-controller/parser/testdata/AKSUbuntu2204+DisablePubkeyAuth/generatedCSECommand | New snapshot for new template output |
| aks-node-controller/parser/testdata/AKSUbuntu2204+DefaultPubkeyAuth/generatedCSECommand | New snapshot for new template output |
| aks-node-controller/parser/testdata/AKSUbuntu2204+CustomOSConfig/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+CustomCloud/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+Containerd+MIG/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| aks-node-controller/parser/testdata/AKSUbuntu2204+CloudProviderOverrides/generatedCSECommand | New snapshot for new template output |
| aks-node-controller/parser/testdata/AKSUbuntu2204+China/generatedCSECommand | Regenerated snapshot for new CSE cmd template |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AzureLinuxV2+Kata/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+China/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+cgroupv2/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+ootcredentialprovider/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+SecurityProfile/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOn/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOff/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2404+NetworkPolicy/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/AKSUbuntu2404+Teleport/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/CustomizedImage/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/CustomizedImageKata/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData | Regenerated snapshot (custom data gzip payload changed) |
| pkg/agent/testdata/Flatcar/CustomData.inner | Regenerated snapshot (embedded gzip payload changed) |
| pkg/agent/testdata/ACL/CustomData.inner | Regenerated snapshot (embedded gzip payload changed) |
You can also share your feedback on Copilot code review. Take the survey.
44ff9ee to
a0a1307
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to unify AKS custom-cloud CA certificate bootstrap behavior (legacy vs “rcv1p/operation-requests” style flows) and adds a Windows scheduled task to periodically refresh custom-cloud CA certificates.
Changes:
- Adds Windows CA refresh scheduled task registration and introduces location-based endpoint-mode selection (legacy vs rcv1p).
- Refactors Windows CA certificate retrieval to support both endpoint modes and opt-in gating for rcv1p.
- Simplifies Linux custom-cloud init script selection by consolidating onto
init-aks-custom-cloud.shand removing older variants; updates generated testdata accordingly.
Reviewed changes
Copilot reviewed 93 out of 99 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| staging/cse/windows/kubernetesfunc.ps1 | Adds CA refresh scheduled task and endpoint-mode-aware Get-CACertificates implementation. |
| pkg/agent/variables.go | Simplifies how initAKSCustomCloud is added to Linux cloud-init variables. |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/Flatcar/CustomData.inner | Updates expected Flatcar CustomData snapshot (generated content changed). |
| pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/CustomizedImageKata/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/CustomizedImage/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AzureLinuxV2+Kata/CustomData | Updates expected CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingNoConfig/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingDisabled/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworking/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+ootcredentialprovider/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+SecurityProfile/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+ManagedIdentity/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+KubeletServingCertificateRotation/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+KubeletClientTLSBootstrapping/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S119/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S119+FIPS/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S119+CSI/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S118/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S117/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+K8S116/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+EnablePrivateClusterHostsConfigAgent/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+CustomVnet/CustomData | Updates expected Windows CustomData snapshot (calls/refresh task additions). |
| pkg/agent/testdata/AKSWindows2019+CustomCloud/CustomData | Updates expected Windows CustomData snapshot (new Get-CACertificates call form + refresh task). |
| pkg/agent/testdata/AKSWindows2019+CustomCloud+ootcredentialprovider/CustomData | Updates expected Windows CustomData snapshot (new Get-CACertificates call form + refresh task). |
| pkg/agent/testdata/AKSUbuntu2404+Teleport/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2404+NetworkPolicy/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+ootcredentialprovider/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+cgroupv2/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+SecurityProfile/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOn/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/AKSUbuntu2204+China/CustomData | Updates expected Ubuntu CustomData snapshot (generated content changed). |
| pkg/agent/testdata/ACL/CustomData.inner | Updates expected ACL CustomData snapshot (generated content changed). |
| pkg/agent/const.go | Consolidates custom-cloud init script constants to a single script. |
| parts/windows/kuberneteswindowssetup.ps1 | Updates Windows setup flow to call Get-CACertificates with location and registers CA refresh scheduled task. |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.sh | Removes operation-requests-specific Linux init script (consolidation). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.sh | Removes Mariner/AzureLinux operation-requests init script (consolidation). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.sh | Removes Mariner/AzureLinux legacy init script variant (consolidation). |
| aks-node-controller/parser/templates/cse_cmd.sh.gtpl | Adds a LOCATION shell variable in the generated CSE command template. |
| aks-node-controller/parser/helper.go | Factors out a shared getCloudLocation helper and reuses it in getCloudTargetEnv. |
You can also share your feedback on Copilot code review. Take the survey.
2b3c1d6 to
e19a19b
Compare
e19a19b to
d41856f
Compare
There was a problem hiding this comment.
Pull request overview
This PR unifies the AKS custom cloud CA certificate bootstrap logic to a single flow and adds a Windows scheduled task to periodically refresh custom cloud CA certificates. It also updates Linux/customdata generation and test snapshots to reflect the new wiring.
Changes:
- Add Windows scheduled task registration for daily CA certificate refresh and introduce a location-based cert endpoint mode selector.
- Simplify Linux custom cloud init script selection by standardizing on
init-aks-custom-cloud.sh, plus add wiring/tests for refresh-mode arguments. - Update aks-node-controller template to export
LOCATION, and regenerate CustomData snapshot test artifacts.
Reviewed changes
Copilot reviewed 95 out of 101 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| staging/cse/windows/kubernetesfunc.tests.ps1 | Adds Pester coverage for cert endpoint mode selection, scheduled task registration, and CA retrieval behavior. |
| staging/cse/windows/kubernetesfunc.ps1 | Implements unified Windows CA retrieval logic with legacy/rcv1p modes and registers a daily refresh scheduled task. |
| spec/parts/linux/cloud-init/artifacts/init_aks_custom_cloud_spec.sh | Adds ShellSpec assertions to validate refresh-mode argument parsing/wiring in the Linux init script. |
| pkg/agent/variables.go | Changes how initAKSCustomCloud is injected into Linux cloud-init data. |
| pkg/agent/const.go | Removes per-cloud custom init script constants and standardizes on init-aks-custom-cloud.sh. |
| parts/windows/kuberneteswindowssetup.ps1 | Wires CA retrieval call and registers the Windows CA refresh scheduled task during BasePrep. |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.sh | Removed (operation-requests variant no longer used). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.sh | Removed (operation-requests Mariner/AzureLinux variant no longer used). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.sh | Removed (Mariner/AzureLinux legacy variant no longer used). |
| aks-node-controller/parser/templates/cse_cmd.sh.gtpl | Exports LOCATION into the CSE environment for downstream scripts. |
| aks-node-controller/parser/helper.go | Adds a helper to normalize location and reuses it in cloud target env detection. |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/Flatcar/CustomData.inner | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/CustomizedImageKata/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/CustomizedImage/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AzureLinuxV2+Kata/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingNoConfig/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingDisabled/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworking/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+ootcredentialprovider/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+SecurityProfile/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+ManagedIdentity/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+KubeletServingCertificateRotation/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+KubeletClientTLSBootstrapping/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S119/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S119+FIPS/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S119+CSI/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S118/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S117/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+K8S116/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+EnablePrivateClusterHostsConfigAgent/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+CustomVnet/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+CustomCloud/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSWindows2019+CustomCloud+ootcredentialprovider/CustomData | Regenerated CustomData snapshot due to Windows CA refresh task wiring. |
| pkg/agent/testdata/AKSUbuntu2404+Teleport/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2404+NetworkPolicy/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+ootcredentialprovider/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+cgroupv2/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+SecurityProfile/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOn/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/AKSUbuntu2204+China/CustomData | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
| pkg/agent/testdata/ACL/CustomData.inner | Regenerated CustomData snapshot due to init/custom cloud wiring changes. |
You can also share your feedback on Copilot code review. Take the survey.
d41856f to
18ba549
Compare
18ba549 to
e94c465
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates AKS custom cloud certificate bootstrapping to use a single unified flow and adds a Windows scheduled task for periodic custom cloud CA refresh.
Changes:
- Added Windows CA refresh task registration plus new logic to select cert retrieval mode and opt-in gating.
- Simplified Linux custom cloud init script wiring by removing legacy “operation-requests” variants and normalizing location for refresh mode.
- Added/updated tests and refreshed golden testdata outputs to reflect new custom data content.
Reviewed changes
Copilot reviewed 95 out of 101 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| staging/cse/windows/kubernetesfunc.tests.ps1 | Adds Pester coverage for endpoint-mode selection, task registration behavior, and CA retrieval failure handling. |
| staging/cse/windows/kubernetesfunc.ps1 | Implements endpoint-mode derivation, opt-in gating, CA retrieval paths, and a Windows scheduled task for refresh. |
| spec/parts/linux/cloud-init/artifacts/init_aks_custom_cloud_spec.sh | Adds ShellSpec checks to ensure init script wiring for ca-refresh mode and LOCATION usage. |
| pkg/agent/variables.go | Simplifies init script selection and updates how custom cloud init script is injected into cloud-init data. |
| pkg/agent/const.go | Removes now-unused custom-cloud init script constants; keeps unified init script constant. |
| parts/windows/kuberneteswindowssetup.ps1 | Updates Windows setup to call Get-CACertificates with Location and conditionally register refresh task. |
| aks-node-controller/parser/templates/cse_cmd.sh.gtpl | Adds LOCATION variable for downstream scripts during custom cloud provisioning. |
| aks-node-controller/parser/helper.go | Adds getCloudLocation helper and reuses it for cloud target env detection. |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.sh | Removes legacy operation-requests init script (superseded by unified script). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.sh | Removes legacy Mariner operation-requests init script (superseded by unified script). |
| parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.sh | Removes legacy Mariner init script variant (superseded by unified script). |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/Flatcar/CustomData.inner | Updates golden ignition/customData payload for unified custom cloud init content. |
| pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/CustomizedImageKata/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/CustomizedImage/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AzureLinuxV3+Kata/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AzureLinuxV2+Kata/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingNoConfig/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworkingDisabled/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows23H2Gen2+NextGenNetworking/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+ootcredentialprovider/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+SecurityProfile/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+ManagedIdentity/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+KubeletServingCertificateRotation/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+KubeletClientTLSBootstrapping/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S119/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S119+FIPS/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S119+CSI/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S118/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S117/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+K8S116/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+EnablePrivateClusterHostsConfigAgent/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+CustomVnet/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+CustomCloud/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSWindows2019+CustomCloud+ootcredentialprovider/CustomData | Updates golden Windows customData to pass Location to CA cert retrieval + refresh task gating. |
| pkg/agent/testdata/AKSUbuntu2404+Teleport/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2404+NetworkPolicy/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+ootcredentialprovider/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+cgroupv2/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+SecurityProfile/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOn/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+SSHStatusOff/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/AKSUbuntu2204+China/CustomData | Updates golden customData to match unified custom cloud init content. |
| pkg/agent/testdata/ACL/CustomData.inner | Updates golden ignition/customData payload for unified custom cloud init content. |
Comments suppressed due to low confidence (7)
staging/cse/windows/kubernetesfunc.ps1:1
Get-CACertificatesused to fail fast viaSet-ExitCodeon retrieval/parse errors, but now returns$false(and logs warnings) for a wide range of failure cases. Because call sites in the generated setup scripts invokeGet-CACertificates -Location $Locationwithout checking the return value, this can silently proceed without required CA material and lead to harder-to-diagnose TLS failures later in provisioning. Consider restoring fatal behavior for “expected-to-install” scenarios (e.g., legacy mode, or rcv1p when opted-in), or have callers check the return value and invokeSet-ExitCodewhen it’s$falsein those modes.
staging/cse/windows/kubernetesfunc.ps1:1Get-CACertificatesused to fail fast viaSet-ExitCodeon retrieval/parse errors, but now returns$false(and logs warnings) for a wide range of failure cases. Because call sites in the generated setup scripts invokeGet-CACertificates -Location $Locationwithout checking the return value, this can silently proceed without required CA material and lead to harder-to-diagnose TLS failures later in provisioning. Consider restoring fatal behavior for “expected-to-install” scenarios (e.g., legacy mode, or rcv1p when opted-in), or have callers check the return value and invokeSet-ExitCodewhen it’s$falsein those modes.
pkg/agent/variables.go:1- This change removes the previous
cs.IsAKSCustomCloud()guard and injects the custom cloud init script intocloudInitDataunconditionally. That can increase customData size for all clusters (risking platform limits) and may introduce unintended side effects if any downstream template writes/executes this script outside custom cloud. Recommend reinstating the custom cloud guard (and only settinginitAKSCustomCloudwhenIsAKSCustomCloud()is true), while still using the unifiedinitAKSCustomCloudScriptfor all custom clouds.
staging/cse/windows/kubernetesfunc.ps1:1 $resourceFileNameis used directly to build a path underC:\ca. If the upstream response ever contains path separators (e.g.,..\fooor nested paths), this can write outside the intended directory. Prefer sanitizing to a basename (e.g., usingSplit-Path -Leafor[IO.Path]::GetFileName($resourceFileName)) beforeJoin-Path, and consider rejecting names containing directory traversal characters.
staging/cse/windows/kubernetesfunc.ps1:1$resourceFileNameis used directly to build a path underC:\ca. If the upstream response ever contains path separators (e.g.,..\fooor nested paths), this can write outside the intended directory. Prefer sanitizing to a basename (e.g., usingSplit-Path -Leafor[IO.Path]::GetFileName($resourceFileName)) beforeJoin-Path, and consider rejecting names containing directory traversal characters.
staging/cse/windows/kubernetesfunc.ps1:1- The new rcv1p operation-requests flow is non-trivial (multiple requests, JSON shape assumptions, per-item content downloads, and
$downloadedAnyaggregation), but the added Pester tests only cover legacy mode and the “throws returns false” path. Add tests that (1) exercise the rcv1p path end-to-end with mockedRetry-Commandreturning operation requests and cert bodies, and (2) verify behavior when operation requests are empty/invalid (ensuring the function returns$falseand logs expected warnings).
pkg/agent/variables.go:1 - The PR description still contains placeholder text (
Fixes #with no linked issue and no explanation of “what/why”). Please update the PR description to summarize the behavior change (unified bootstrap + Windows refresh task) and link the relevant issue or remove the placeholder.
e94c465 to
f20d5b8
Compare
f20d5b8 to
b53f240
Compare
…ptionId param Variable group ab-e2e-tme defines E2E_SUBSCRIPTION_ID, which ADO auto-exposes as an env var to all tasks. The previous task-level env: override did not reliably win against the auto-exposed group variable in AzureCLI@2, so the orchestrator's --subscription-id (e.g. RCV1P sub) was ignored and tests ran/skipped against the default TME sub. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ails The rcv1p opted-in path previously ignored the exit status of install_certs_to_trust_store, allowing provisioning to continue with an incomplete trust store. This now matches the legacy path's behavior of exiting non-zero on failure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…l variable reference
Previously the subscriptionId parameter defaulted to $(E2E_SUBSCRIPTION_ID)
and the template defined a variable E2E_SUBSCRIPTION_ID with value
${{parameters.subscriptionId}}, creating a cycle when no caller explicitly
passed subscriptionId. Use an empty-string sentinel default and only override
the pipeline variable when the parameter is non-empty.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- init-aks-custom-cloud.sh: sanitize cert_filename via basename to prevent path traversal from wireserver response (defense-in-depth) - init-aks-custom-cloud.sh: quote $certs in retrieve_legacy_certs to avoid word-splitting on JSON payload - scenario_rcv1p_win_test.go / validators.go: update misleading comments that claimed Windows cert store import; current validators only check files in C:\ca Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…gainst RCV1P sub When the aks-rp orchestrator routes e2e-tme.yaml to the RCV1P testing subscription via --subscription-id, the variables inherited from variable group ab-e2e-tme are incompatible with the RCV1P sub: - BLOB_STORAGE_ACCOUNT_PREFIX=abe2etme yields globally-taken account name abe2etmewestus3 (already owned by the regular TME E2E sub), causing StorageAccountAlreadyTaken on every Linux RCV1P scenario. - RCV1P_TAGS_AUTO_INJECTED defaults true, but the RCV1P sub does not auto-inject opt-in tags; the framework must stamp them explicitly. - IGNORE_SCENARIOS_WITH_MISSING_VHD=false fails Windows RCV1P scenarios when the Linux orchestrator only published Linux VHDs. Detect the RCV1P sub at runtime in e2e_run.sh and override these three settings so the same e2e-tme.yaml pipeline works for both regular and RCV1P targets without requiring orchestrator changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ardcoded GUID Replace the hardcoded RCV1P subscription GUID with a comparison against E2E_SUBSCRIPTION_ID_RCV1P sourced from the variable group. This keeps subscription identity out of the script, lets the value rotate without code changes, and makes the override path a no-op for pipelines (e.g. the MSFT-tenant default) that do not define E2E_SUBSCRIPTION_ID_RCV1P. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Storage account names are globally unique in Azure. When a pipeline that normally targets one subscription is redirected at runtime to another (e.g. orchestrator-routed RCV1P run via --subscription-id), the existing account name collides with StorageAccountAlreadyTaken and the run fails before any test executes. Append a deterministic 6-hex suffix derived from SubscriptionID to make the account name unique per subscription with zero per-environment configuration. The framework remains agnostic to which subscription is running — no subscription identity check anywhere in this repo. Also reverts the RCV1P-specific override block from e2e_run.sh (which coupled AgentBaker to an aks-rp variable group). Remaining RCV1P-sub mismatches (RCV1P_TAGS_AUTO_INJECTED default, missing-VHD handling) will be addressed in a follow-up. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ValidateRCV1PCertModeWindows only checks C:\ca files and the aks-ca-certs-refresh-task scheduled task; it does not import certs into the Windows certificate store. Update Test_RCV1P_Windows2022 and Test_RCV1P_Windows2022Gen2 docstrings to reflect actual coverage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DIAGNOSTIC PATCH (may be reverted). When the top-level try/catch in kuberneteswindowssetup.ps1 catches an exception, Set-ExitCode calls `exit` which skips the `finally` block, so provision.complete is never written and csecmd.ps1 surfaces the opaque WINDOWS_CSE_ERROR_NO_CSE_RESULT_LOG (50) instead of the real inner ExitCode/ErrorMessage. Write the completion file from the catch block so MSFT-tenant Windows RCV1P e2e failures show their actual root cause. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This reverts commit 2926323.
On baked Windows VHDs, C:\AzureData\windows\azurecnifunc.ps1 already exists, which short-circuits the CSEScriptsPackage download and overwrite. That means an operator-supplied (fully-qualified) CSEScriptsPackageUrl is silently ignored and the VM runs the cached, baked CSE scripts instead. Production RP only ever sets CSEScriptsPackageUrl as a base URL ending in '/'; a fully-qualified .zip URL is opt-in (e.g. RCV1P e2e branch builds). Treat that signal as 'always overwrite cached scripts' so branch fixes actually execute on baked VHDs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
BlobStorageAccount() concatenates BlobStorageAccountPrefix + DefaultLocation + 6-hex subscription suffix. Current defaults (abe2e + westus3 + 6) yield 18 chars, but a future longer region (e.g. germanywestcentral, 18) would push the total to 29 and break account creation. Truncate the base portion deterministically so the suffix is preserved and identical inputs always resolve to the same account name. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…epos Replace `[[ ]]` with `[ ]`, quote `\`, and capture curl's exit code explicitly before grep. Removes the SC3010 disable; matches the repo's POSIX shellcheck gate. Also renames the variable to `curl_output` since it holds stdout, not an exit code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…L provided" This reverts commit 7e59539.
When CSE crashes before reaching the main try/finally block, the finally never writes provision.complete, so csecmd.ps1's wrapper throws the opaque "WINDOWS_CSE_ERROR_NO_CSE_RESULT_LOG" (exit 50). Azure refuses to run a follow-up RunCommand once the extension is in failed state, so we cannot fetch CustomDataSetupScript.log after the fact. This diagnostic pre-writes a placeholder provision.complete with marker exit code 99 and installs a top-level trap that captures the actual error message before any nested `exit` bypasses the main finally. If CSE completes normally, the finally overwrites the placeholder with the real result, so production behavior is unchanged on success. REVERT this commit once the RCV1P regression is diagnosed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…(REVERT ME)" This reverts commit 9f5e36b.
…o surface preamble failures (REVERT ME) Without this guard, any failure between line ~256 and the main try-block at line ~632 (download, expand-archive, or top-level error in any dot-sourced .ps1 file) leaves the VM with no provision.complete file, and CSE returns the opaque WINDOWS_CSE_ERROR_NO_CSE_RESULT_LOG (exit 50) with no information about the actual cause. This wrapper writes a structured provision.complete with exit code 76 and the underlying error message, converting the silent failure into a diagnosable one. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| function check_url { | ||
| local url=$1 | ||
| echo "Checking url: $url" | ||
|
|
||
| # Use curl to check the URL and capture both stdout and stderr | ||
| curl_output=$(curl -s --head --request GET "$url") | ||
| curl_status=$? | ||
| # Check the exit status of curl | ||
| if [ $curl_status -ne 0 ] || echo "$curl_output" | grep -Eq "404 Not Found"; then | ||
| echo "ERROR: $url is not available. Please manually check if the url is valid before re-running script" | ||
| exit 1 | ||
| fi | ||
| } |
| function aptget_update { | ||
| echo "apt-get updating..." | ||
| echo "note: depending on how many sources have been added this may take a couple minutes..." | ||
| if apt-get update | grep -q "404 Not Found"; then | ||
| echo "ERROR: apt-get update failed to find all sources. Please validate the sources or remove bad sources from your sources and try again." | ||
| exit 1 | ||
| else | ||
| echo "apt-get update complete!" | ||
| fi | ||
| } |
| if [ "$IS_MARINER" -eq 1 ]; then | ||
| echo "Initializing Mariner repo depot settings..." | ||
| init_mariner_repo_depot ${marinerRepoDepotEndpoint} | ||
| dnf_makecache || exit 1 | ||
| else | ||
| echo "Initializing Azure Linux repo depot settings..." | ||
| init_azurelinux_repo_depot ${marinerRepoDepotEndpoint} | ||
| dnf_makecache || exit 1 | ||
| fi |
| Describe 'sourcing contract — script must be safe to source' | ||
| It 'does not call exit at top level (would terminate the sourcing parent)' | ||
| # Exits only allowed inside functions (check_url, aptget_update, dnf_makecache) | ||
| # or guarded by branch-internal error handling. No bare top-level exit. | ||
| When run grep -En '^exit( |$)' "$script_path" | ||
| The status should eq 1 | ||
| End |
| # Source the repo depot and chrony initialization script if present. | ||
| # This script is only included in custom cloud images and handles repo depot | ||
| # configuration and chrony setup. It inherits all variables from this script. | ||
| REPOS_SCRIPT="$(dirname "$(readlink -f "$0")")/init-aks-custom-cloud-repos.sh" | ||
| if [ -f "$REPOS_SCRIPT" ] && [ -s "$REPOS_SCRIPT" ]; then | ||
| source "$REPOS_SCRIPT" | ||
| fi |
…ain baseline Diagnostic: if RCV1P Windows e2e passes with this commit, the failure is in our kubernetesfunc.ps1 edits. If it still fails, the failure is elsewhere (other staging/cse/windows/ files or e2e infra). MUST be reverted before merge.
… account The per-subscription storage account naming scheme (e.g. abe2etmewestus338d771 for sub 38d77129...) creates a fresh storage account whenever a new E2E_SUBSCRIPTION_ID is used. The e2e bootstrap was granting Storage Blob Data Contributor only to the VM managed identity, not to the test runner principal (ADO service-connection SP in pipelines, or developer's user identity locally). The runner has management-plane Contributor (sufficient to CREATE the storage account) but no data-plane RBAC, producing 'HTTP 403 AuthorizationPermissionMismatch' when uploading the branch CSE zip for RCV1P Windows tests. Resolve the runner's object ID by decoding the JWT 'oid' claim from an ARM access token. Distinguish ServicePrincipal vs User via the 'idtyp' claim so ARM accepts the role assignment in both pipeline and local-dev contexts. Idempotent: a 409 Conflict on re-run is swallowed (matches the existing assignRolesToVMIdentity behavior). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| Get-CACertificates | ||
| {{end}} | ||
|
|
||
| Get-CACertificates -Location $Location -FailOnError |
| # Guard against older CSE packages that do not yet export Should-InstallCACertificatesRefreshTask. | ||
| # If the function is absent (old package), fall back to the previous unconditional behaviour so | ||
| # that legacy/ussec/usnat clusters continue to register the refresh task. | ||
| if (Get-Command -Name Should-InstallCACertificatesRefreshTask -ErrorAction Ignore) { | ||
| if (Should-InstallCACertificatesRefreshTask -Location $Location) { |
| # Wireserver may return JSON ({"IsOptedInForRootCerts":true}) or key=value | ||
| # (IsOptedInForRootCerts=true). Use jq for proper JSON parsing. | ||
| if echo "$opt_in_response" | jq -e '.IsOptedInForRootCerts == true' > /dev/null 2>&1; then | ||
| echo "IsOptedInForRootCerts=true" | ||
| return 0 | ||
| fi |
| function install_certs_to_trust_store { | ||
| mkdir -p /root/AzureCACertificates | ||
|
|
||
| debug_print_trust_store "before" | ||
|
|
||
| if [ "$IS_ACL" -eq 1 ] || [ "$IS_MARINER" -eq 1 ] || [ "$IS_AZURELINUX" -eq 1 ]; then | ||
| cp /root/AzureCACertificates/*.crt /etc/pki/ca-trust/source/anchors/ | ||
| update-ca-trust | ||
| elif [ "$IS_FLATCAR" -eq 1 ]; then |
The Azure Firewall app rule in aks_model.go:219 whitelists the storage account FQDN at cluster-creation time. After 3f1c640 made the storage account name subscription-unique (abe2ewestus3 -> abe2ewestus38ecadf in MSFT sub), pre-existing cached RCV1P clusters still embed the old FQDN in their firewall, so Windows VMs cannot reach the new storage to download the branch CSE zip, causing exit 50 (provision.complete not generated). Bumping the cluster name forces fresh clusters with a firewall rule matching the new storage account name. Follows the established vN versioning pattern used on main (abe2e-kubenet-v5, abe2e-azure-network-v4, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ps1 to main baseline" This reverts commit b450d67.
…/catch to surface preamble failures (REVERT ME)" This reverts commit 9f963d5.
…drifts The shared Azure Firewall `abe2e-fw` was created once with the legacy storage FQDN `abe2ewestus3.blob.core.windows.net`. After the per-sub storage rename (`BlobStorageAccount()` now returns `abe2ewestus3<sub-suffix>`), the firewall still blocked the new FQDN. Windows CSE failed with NO_CSE_RESULT_LOG because the zip download timed out before any log file could be written; Linux CSE hung on artifact downloads. `ensureSharedFirewall` previously returned early on any existing firewall, never reconciling its rules. This change detects FQDN drift on the dynamic blob-storage-fqdn rule and re-issues CreateOrUpdate with the current target FQDN, reusing the existing public IP to preserve external bindings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| if [ "$ret" -ne 0 ]; then | ||
| return $ret | ||
| fi | ||
| } |
| elif [[ $NAME == *"Mariner"* ]]; then | ||
| IS_MARINER=1 | ||
| elif [[ $NAME == *"Microsoft Azure Linux"* ]]; then | ||
| IS_AZURELINUX=1 |
| foreach ($operation in $operationJson.OperationsInfo) { | ||
| $resourceFileName = $operation.ResouceFileName | ||
| if ([string]::IsNullOrEmpty($resourceFileName)) { | ||
| continue | ||
| } | ||
|
|
||
| $resourceType = [IO.Path]::GetFileNameWithoutExtension($resourceFileName) | ||
| $resourceExt = [IO.Path]::GetExtension($resourceFileName).TrimStart('.') | ||
| $resourceUri = "http://168.63.129.16/machine?comp=acmspackage&type=$resourceType&ext=$resourceExt" | ||
|
|
||
| $certContentResponse = Retry-Command -Command 'Invoke-WebRequest' -Args @{Uri=$resourceUri; UseBasicParsing=$true} -Retries 10 -RetryDelaySeconds 10 | ||
| if ([string]::IsNullOrEmpty($certContentResponse.Content)) { | ||
| Write-Log "Warning: empty certificate content for $resourceFileName" | ||
| continue | ||
| } | ||
|
|
||
| $certFilePath = Join-Path $caFolder $resourceFileName | ||
| Write-Log "Write certificate $resourceFileName to $certFilePath" | ||
| $certContentResponse.Content > $certFilePath | ||
| $downloadedAny = $true | ||
| } |
… upload Windows RCV1P scenarios construct rcv1pWindowsCSEMutator at scenario-struct init time, which calls getOrBuildBranchCSEPackageURL -> buildAndUploadCSEZip. This runs BEFORE RunScenario -> CachedCreateVMManagedIdentity, which is what actually creates the per-sub blob storage account on first use. On westus3 the account had been created by prior runs so the upload worked. The first ever run in southcentralus had no pre-existing account, so the blob client's first PUT failed with NXDOMAIN: dial tcp: lookup abe2etmesouthcentr38d771.blob.core.windows.net on 127.0.0.53:53: no such host Piggyback on the same CachedCreateVMManagedIdentity that Linux scenarios use to guarantee the storage account exists before attempting the upload. Bump the ctx timeout from 2m to 5m to cover a cold-start storage account create (~30-90s) on top of the zip build/upload. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
What this PR does / why we need it
This PR implements RCV1P (Robust Certificate Validation for 1P) — the next-generation mechanism for distributing Azure root CA certificates to AKS nodes. Instead of hardcoding certificate bundles, RCV1P queries the Azure wireserver at provisioning time to download and install the latest root certificates into the OS trust store.
Reference: https://eng.ms/docs/products/onecert-certificates-key-vault-and-dsms/onecert-customer-guide/autorotationandecr/rcv1ptsg
Summary of Changes
1. Linux: Unified cert bootstrap flow (
init-aks-custom-cloud.sh)init-aks-custom-cloud-mariner.sh,init-aks-custom-cloud-operation-requests-mariner.sh, andinit-aks-custom-cloud-operation-requests.sh. All cert logic now flows through a singleinit-aks-custom-cloud.shthat detects the distro (Ubuntu, Mariner, AzureLinux, Flatcar, ACL) at runtime.init-aks-custom-cloud-repos.shto keep basecustomDatasize small for non-custom-cloud scenarios (critical for Flatcar/ACL which have tight size limits).legacy(ussec/usnat regions) andrcv1p(all other regions), selected by cloud location at runtime.2. Windows: CA cert refresh task and rcv1p support (
kubernetesfunc.ps1)Get-CACertificateswith-Locationparameter: Determines cert endpoint mode from location, uses legacy endpoint for ussec/usnat, rcv1p for all others.Register-CACertificatesRefreshTask: Registers a daily scheduled task to refresh CA certificates, with backward compatibility for older VHDs that don't accept-Location.Should-InstallCACertificatesRefreshTask: Gates refresh task registration on wireserver opt-in status.3. E2E tests (
e2e/scenario_rcv1p_test.go,e2e/scenario_rcv1p_win_test.go)C:\ca, Windows certificate store import, and scheduled task registration.Test_RCV1P_NotOptedInverifies that omitting the VM opt-in tag correctly prevents cert installation..pipelines/e2e-rcv1p.yamlruns daily at 3am PST with tag filterrcv1pcertmode=true(not yet enabled).4. E2E infrastructure: multi-subscription and VM instance tagging
RCV1P_SUBSCRIPTION_ID) with theMicrosoft.Compute/PlatformSettingsOverridefeature flag. AddedSubscriptionIDfield to scenarios andGetAzure()/GetSubscriptionID()helpers.platformsettings.host_environment.service.platform_optedin_for_rootcerts=true) on the VMSS at creation time via aVMConfigMutator. VMSS-level tags inherit to VM instances automatically.1. Cert endpoint mode is determined by cloud location, not a flag
Decision:
ussec*/usnat*→legacymode, everything else →rcv1pmode. This is determined at runtime from the node's Azure location.Why: Avoids requiring a new API contract field. The location-based approach lets us roll out rcv1p incrementally — ussec/usnat stay on the legacy endpoint that works today, while all other regions use the new rcv1p endpoint with opt-in gating.
2. Two-layer access control for rcv1p
Decision: Both conditions must be met for cert installation:
Microsoft.Compute/PlatformSettingsOverride) enables the wireserver endpointplatformsettings.host_environment.service.platform_optedin_for_rootcerts=true) grants per-VM accessWhy: Defense in depth — the subscription flag is a coarse gate, the VM tag provides per-node opt-in control. Without the tag, wireserver returns
IsOptedInForRootCerts=false.3. VM opt-in tag is set at VMSS creation time
Decision: The opt-in tag (
platformsettings.host_environment.service.platform_optedin_for_rootcerts=true) is set on the VMSS at creation time and inherits to all VM instances automatically.Why: VMSS-level tags propagate to VM instances, and wireserver reads the tag from the VM instance to determine opt-in status. In E2E tests, the positive tests set the tag via a
VMConfigMutatorat VMSS creation, while the negative test (Test_RCV1P_NotOptedIn) simply omits the tag to verify wireserver returnsIsOptedInForRootCerts=false.4.
Get-CACertificatesmoved outsideIsAKSCustomCloudguard (Windows)Decision:
Get-CACertificates -Location $Location -FailOnErrornow runs for all clouds, not just custom clouds.Why: RCV1P applies to all clouds. The function itself handles the location-based mode selection internally and gracefully skips cert installation when wireserver returns
IsOptedInForRootCerts=false(which is the case on public cloud without the feature flag).5. Wireserver failures are fatal after retries
Decision: If wireserver cert endpoints fail after exhausting retries, provisioning fails (
exit 1on Linux,throwon Windows with-FailOnError).Why: Cert installation is required for the selected mode. Silently continuing without certificates would leave the node in an inconsistent state. Retries with backoff handle transient wireserver issues (rate limiting, temporary unavailability).
6. Backward compatibility for Windows VHD/CSE version skew
Decision:
kuberneteswindowssetup.ps1guardsRegister-CACertificatesRefreshTaskwithGet-Commandchecks before calling it.Why: Windows VHD and CSE release independently. Newer CSE must not crash on older VHDs that don't have these functions. The guard falls back gracefully.
Testing Evidence
MSFT tenant (default E2E subscription)
Linux (Build 158446017):
IsOptedInForRootCertscheck works (skips on public cloud as expected)Windows (Build 158446024):
Get-CACertificates -Locationcorrectly selects rcv1p modeShould-InstallCACertificatesRefreshTaskreturns$falseon public cloud (correct)TME tenant (RCV1P_SUBSCRIPTION_ID set in pipeline, with PlatformSettingsOverride feature flag)
Linux — Validated end-to-end: wireserver returns
IsOptedInForRootCerts=true, certificates downloaded and installed into OS trust store, refresh schedule registered. Passed across Ubuntu 2204, Ubuntu 2404, AzureLinux V3, Flatcar, ACL.Windows (Build 161633049):
IsOptedInForRootCerts=true, certificates downloaded toC:\ca, scheduled taskaks-ca-certs-refresh-taskregisteredwindows-2022-containerdjob) was a pre-existingTest_Windows2022_VHDCachingissue unrelated to RCV1POperationRequestsinstead ofOperationsInfo) when parsing wireserver responses — this was the root cause of empty cert downloads. Fixed in commitb6cd4e4f68.Files Changed (31 files, +1979 / -1218)
init-aks-custom-cloud.sh,init-aks-custom-cloud-repos.sh(new), 3 removedkubernetesfunc.ps1,kuberneteswindowssetup.ps1kubernetesfunc.tests.ps1(new)scenario_rcv1p_test.go(new),scenario_rcv1p_win_test.go(new)vmss.go,types.go,validators.go,cluster.go,config/e2e-rcv1p.yaml(new)baker.go,const.go,variables.goPR File Breakdown: Functionality vs Tests
Functionality (1,859 lines — 51%)
parts/linux/cloud-init/artifacts/init-aks-custom-cloud.shparts/linux/cloud-init/artifacts/init-aks-custom-cloud-repos.shparts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests.shparts/linux/cloud-init/artifacts/init-aks-custom-cloud-operation-requests-mariner.shstaging/cse/windows/kubernetesfunc.ps1parts/linux/cloud-init/artifacts/init-aks-custom-cloud-mariner.shpkg/agent/variables.goparts/windows/kuberneteswindowssetup.ps1pkg/agent/const.goparts/linux/cloud-init/nodecustomdata.ymlaks-node-controller/parser/helper.goparts/linux/cloud-init/artifacts/cse_cmd.shaks-node-controller/parser/templates/cse_cmd.sh.gtplpkg/agent/baker.goTests / E2E Infra (1,795 lines — 49%)
e2e/scenario_rcv1p_test.gostaging/cse/windows/kubernetesfunc.tests.ps1e2e/scenario_rcv1p_win_test.goe2e/validators.goe2e/cluster.goe2e/vmss.goe2e/config/azure.goe2e/test_helpers.goe2e/types.gospec/parts/linux/cloud-init/artifacts/init_aks_custom_cloud_spec.she2e/cache.goe2e/aks_model.goe2e/config/config.go.pipelines/e2e-rcv1p.yamle2e/kube.go.pipelines/scripts/e2e_run.sh.pipelines/templates/e2e-template.yamlSummary