From 9eea0c11d38d9112ab20085c8326a71a0836824b Mon Sep 17 00:00:00 2001 From: Alexander Laye Date: Tue, 9 Jun 2026 11:27:48 -0400 Subject: [PATCH 1/4] update links and add multi-cloud Signed-off-by: Alexander Laye --- .../preview/architecture/overview.md | 2 +- .../preview/getting-started/deploy-on-aks.md | 4 +- .../multi-region-deployment/multi-cloud.md | 148 ++++++++++++++++++ .../preview/multi-region-deployment/setup.md | 2 +- mkdocs.yml | 1 + 5 files changed, 153 insertions(+), 4 deletions(-) create mode 100644 docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md diff --git a/docs/operator-public-documentation/preview/architecture/overview.md b/docs/operator-public-documentation/preview/architecture/overview.md index 2c0e16a23..d8f037741 100644 --- a/docs/operator-public-documentation/preview/architecture/overview.md +++ b/docs/operator-public-documentation/preview/architecture/overview.md @@ -309,7 +309,7 @@ flowchart LR NEW_PVC --> NEW_CLUSTER ``` -For backup configuration, see [Backup and Restore](../backup-and-restore.md). +For backup configuration, see [Backup and Restore](../operations/backup-and-restore.md). ## Dependencies diff --git a/docs/operator-public-documentation/preview/getting-started/deploy-on-aks.md b/docs/operator-public-documentation/preview/getting-started/deploy-on-aks.md index 8a872280d..d843e361f 100644 --- a/docs/operator-public-documentation/preview/getting-started/deploy-on-aks.md +++ b/docs/operator-public-documentation/preview/getting-started/deploy-on-aks.md @@ -72,7 +72,7 @@ For available classes, see - [AKS storage classes](https://learn.microsoft.com/azure/aks/concepts-storage#storage-classes) - [Azure Disk CSI driver on AKS](https://learn.microsoft.com/azure/aks/azure-disk-csi) -- [DocumentDB storage configuration](../advanced-configuration/README.md#storage-configuration) +- [DocumentDB storage configuration](../configuration/storage.md) ## Monitoring and troubleshooting @@ -111,7 +111,7 @@ kubectl get pods -n kube-system | grep csi-azuredisk enabled - [Encryption at rest](https://learn.microsoft.com/azure/aks/enable-host-encryption) on managed disks -- [TLS configuration](../advanced-configuration/README.md#tls-configuration) +- [TLS configuration](../configuration/tls.md) for database traffic - [Azure RBAC integration](https://learn.microsoft.com/azure/aks/manage-azure-rbac) diff --git a/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md b/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md new file mode 100644 index 000000000..22ddf882d --- /dev/null +++ b/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md @@ -0,0 +1,148 @@ +--- +title: Multi-cloud differences +description: Understand what changes when you extend a multi-region DocumentDB deployment across AKS, GKE, and EKS instead of staying within one cloud provider. +tags: + - multi-region + - multi-cloud + - networking + - istio + - disaster-recovery +--- + +## Overview + +A multi-cloud DocumentDB deployment is a multi-region deployment where the +participating Kubernetes clusters run in different cloud providers. The +[primary-replica model](overview.md#primary-replica-model), replication +configuration, and failover behavior are the same as any multi-region +deployment. + +Use the multi-region docs as the baseline for topology, setup, and failover: + +- [Multi-region deployment overview](overview.md) +- [Multi-region setup guide](setup.md) +- [Multi-region failover procedures](failover-procedures.md) + +This page covers only the differences that matter when those Kubernetes clusters +span AKS, GKE, EKS, or another provider mix. + +## What changes in multi-cloud deployments + +### Provider deployment guides still apply + +Start with the single-provider deployment guides when you need provider-specific +Kubernetes prerequisites, storage defaults, authentication setup, and service +exposure behavior. The multi-cloud layer changes how the Kubernetes clusters +connect to each other; it doesn't replace the provider-specific setup work for +each member Kubernetes cluster. + +- [Deploy on AKS](../getting-started/deploy-on-aks.md) +- [Deploy on EKS](../getting-started/deploy-on-eks.md) +- [Deploy on GKE](../getting-started/deploy-on-gke.md) + +### Networking becomes the primary design decision + +In a single cloud provider, you can usually rely on a cohesive private +networking model such as VNet peering, VPC peering, private DNS, and provider +load balancers. In a multi-cloud deployment, those assumptions don't hold across +every Kubernetes cluster. You need an explicit cross-cloud networking design for +replication traffic, service discovery, and operational access. + +Common approaches include: + +- **Istio multi-Kubernetes-cluster mesh:** Use east-west gateways, shared trust, + remote secrets, and mesh service discovery to route replication traffic across + providers. +- **Site-to-site VPNs:** Connect cloud networks with VPN tunnels when your + organization needs private IP connectivity but doesn't have one native + provider network that spans every Kubernetes cluster. +- **Cloud interconnect or private WAN:** Use dedicated connectivity when latency, + bandwidth, or compliance requirements exceed what internet-routed VPN or mesh + gateways can provide. +- **Public load balancers with strict controls:** Use provider load balancers + only when private connectivity isn't available, and pair them with TLS, + firewall restrictions, and tightly scoped access. + +The playground uses Istio as the reference approach because it works across AKS, +GKE, and EKS without requiring one provider's private networking model to cover +all member Kubernetes clusters. For implementation details, see the +[multi-cloud deployment playground](https://github.com/documentdb/documentdb-kubernetes-operator/tree/main/documentdb-playground/multi-cloud-deployment) +and the upstream [Istio multi-primary multi-network documentation](https://istio.io/latest/docs/setup/install/multicluster/multi-primary_multi-network/). + +For operator-level networking configuration, including the +`crossCloudNetworkingStrategy` field, see the +[multi-region setup guide](setup.md#networking-management). + +### Service discovery needs a cross-provider plan + +DocumentDB replication depends on +[stable service names and reachable endpoints](overview.md#dns-and-service-discovery). +Within one provider, private DNS and network peering are often enough. Across +providers, decide how each Kubernetes cluster resolves and reaches the services +in the other Kubernetes clusters. + +If you use Istio, remote secrets and east-west gateways provide the discovery +and routing path. If you use site-to-site VPNs or private WAN connectivity, make +sure DNS zones, conditional forwarding, firewall rules, and route tables are +configured consistently across providers. + +### Identity and permissions are provider-specific + +Multi-region deployments within one provider often share one IAM model. In a +multi-cloud deployment, each provider has its own identity system, permission +model, audit trail, and credential refresh behavior. + +Plan for: + +- Separate cloud identities for AKS, GKE, and EKS operations. +- Kubernetes RBAC on every member Kubernetes cluster. +- Fleet or GitOps permissions for resource propagation. +- Provider-specific permissions for load balancers, disks, DNS, and network + changes. + +The playground README lists the required provider tools and permissions in +[Prerequisites](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#prerequisites). + +### DNS and client routing need an external source of truth + +After failover, clients must reach the promoted primary DocumentDB cluster. In a +single cloud provider, your DNS and traffic-routing options may be standardized. +Across providers, choose a source of truth that all clients can use, such as a +shared DNS zone, global traffic manager, or application configuration system. + +The playground can create Azure DNS records, including MongoDB SRV records, as +one example. See +[Azure DNS configuration](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#azure-dns-configuration) +and [Connect to database](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#connect-to-database). + +### Observability must include the network layer + +Multi-cloud failures often show up first as networking symptoms: gateway health, +DNS resolution, route changes, packet loss, or certificate trust issues. In +addition to monitoring [replication lag](overview.md#replication-lag) and +DocumentDB cluster health, monitor: + +- Istio control plane and east-west gateway health. +- VPN or interconnect tunnel status. +- Cross-provider latency and packet loss. +- Provider load balancer health. +- DNS record propagation and client resolution. + +For a playground observability example, see the +[multi-cloud telemetry folder](https://github.com/documentdb/documentdb-kubernetes-operator/tree/main/documentdb-playground/multi-cloud-deployment/telemetry). + +## When to use the playground + +Use the multi-cloud playground when you want a runnable AKS, GKE, and EKS +reference deployment. Keep the public docs as the conceptual baseline, and use +the playground for exact commands, environment variables, templates, and cleanup +steps: + +- [deploy.sh](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/deploy.sh) +- [deploy-documentdb.sh](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/deploy-documentdb.sh) +- [Multi-cloud failover operations](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#failover-operations) +- [Multi-cloud troubleshooting](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#troubleshooting) + +## Next steps + +- [Multi-region failover procedures](failover-procedures.md) diff --git a/docs/operator-public-documentation/preview/multi-region-deployment/setup.md b/docs/operator-public-documentation/preview/multi-region-deployment/setup.md index 5754ac235..bbf0fe298 100644 --- a/docs/operator-public-documentation/preview/multi-region-deployment/setup.md +++ b/docs/operator-public-documentation/preview/multi-region-deployment/setup.md @@ -402,6 +402,6 @@ If PVCs aren't provisioning: ## Next steps - [Failover procedures](failover-procedures.md) - Learn how to perform planned and unplanned failovers -- [Backup and restore](../backup-and-restore.md) - Configure multi-region backup strategies +- [Backup and restore](../operations/backup-and-restore.md) - Configure multi-region backup strategies - [TLS configuration](../configuration/tls.md) - Secure connections with proper TLS certificates - [AKS Fleet deployment example](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/aks-fleet-deployment/README.md) - Automated Azure multi-region setup diff --git a/mkdocs.yml b/mkdocs.yml index 40bbb14e3..f3996eca4 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -63,6 +63,7 @@ nav: - Overview: preview/multi-region-deployment/overview.md - Setup Guide: preview/multi-region-deployment/setup.md - Failover Procedures: preview/multi-region-deployment/failover-procedures.md + - Multi-Cloud Differences: preview/multi-region-deployment/multi-cloud.md - FAQ: preview/faq.md - Tools: - Kubectl Plugin: preview/kubectl-plugin.md From 04999354c1b9333e92540441b9c2952650b9b2e2 Mon Sep 17 00:00:00 2001 From: Alexander Laye Date: Tue, 9 Jun 2026 11:45:52 -0400 Subject: [PATCH 2/4] correct title casing Signed-off-by: Alexander Laye --- .../multi-region-deployment/multi-cloud.md | 22 +++++++++++++++++++ .../multi-region-deployment/overview.md | 2 +- mkdocs.yml | 8 +++---- 3 files changed, 27 insertions(+), 5 deletions(-) diff --git a/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md b/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md index 22ddf882d..ef45873e6 100644 --- a/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md +++ b/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md @@ -40,6 +40,27 @@ each member Kubernetes cluster. - [Deploy on EKS](../getting-started/deploy-on-eks.md) - [Deploy on GKE](../getting-started/deploy-on-gke.md) +Use the `clusterList[].environment` variables to indicate which Kubernetes clusters +use certain cloud providers for automatic additions from those guides. + +```yaml title="documentdb.yaml" +apiVersion: documentdb.io/preview +kind: DocumentDB +metadata: + name: documentdb-preview + namespace: documentdb-preview-ns +spec: + clusterReplication: + primary: member-eastus2-cluster + clusterList: + - name: aws-cluster + environment: eks + - name: azure-cluster + environment: aks + - name: gcp-cluster + environment: gke +``` + ### Networking becomes the primary design decision In a single cloud provider, you can usually rely on a cohesive private @@ -146,3 +167,4 @@ steps: ## Next steps - [Multi-region failover procedures](failover-procedures.md) +- [Multi-cloud playground](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md) diff --git a/docs/operator-public-documentation/preview/multi-region-deployment/overview.md b/docs/operator-public-documentation/preview/multi-region-deployment/overview.md index 341b3e7df..fe61eff19 100644 --- a/docs/operator-public-documentation/preview/multi-region-deployment/overview.md +++ b/docs/operator-public-documentation/preview/multi-region-deployment/overview.md @@ -94,7 +94,7 @@ Use cloud-native VNet/VPC peering for direct Kubernetes cluster-to-cluster commu DocumentDB replication requires these ports between Kubernetes clusters: | Port | Protocol | Purpose | -| | | | +|------|----------|------------------------------------------| | 5432 | TCP | PostgreSQL streaming replication | | 443 | TCP | Kubernetes API (for KubeFleet, optional) | diff --git a/mkdocs.yml b/mkdocs.yml index f3996eca4..838fef48f 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -59,11 +59,11 @@ nav: - Monitoring: - Overview: preview/monitoring/overview.md - Metrics Reference: preview/monitoring/metrics.md - - Multi-Region Deployment: + - Multi-region deployment: - Overview: preview/multi-region-deployment/overview.md - - Setup Guide: preview/multi-region-deployment/setup.md - - Failover Procedures: preview/multi-region-deployment/failover-procedures.md - - Multi-Cloud Differences: preview/multi-region-deployment/multi-cloud.md + - Setup guide: preview/multi-region-deployment/setup.md + - Failover procedures: preview/multi-region-deployment/failover-procedures.md + - Multi-cloud differences: preview/multi-region-deployment/multi-cloud.md - FAQ: preview/faq.md - Tools: - Kubectl Plugin: preview/kubectl-plugin.md From e58de1965a4c0cf0a097dc15d02ec380c81639b3 Mon Sep 17 00:00:00 2001 From: Alexander Laye <145385576+alaye-ms@users.noreply.github.com> Date: Wed, 10 Jun 2026 09:07:17 -0400 Subject: [PATCH 3/4] Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Alexander Laye --- .../preview/multi-region-deployment/multi-cloud.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md b/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md index ef45873e6..5e75443dc 100644 --- a/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md +++ b/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md @@ -51,7 +51,7 @@ metadata: namespace: documentdb-preview-ns spec: clusterReplication: - primary: member-eastus2-cluster + primary: azure-cluster clusterList: - name: aws-cluster environment: eks From 3486420b8743175b42beb38e8bf2dae0f323225a Mon Sep 17 00:00:00 2001 From: Alexander Laye Date: Fri, 12 Jun 2026 11:55:34 -0400 Subject: [PATCH 4/4] address review comments Signed-off-by: Alexander Laye --- .../multi-region-deployment/multi-cloud.md | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md b/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md index 5e75443dc..6baf64a6f 100644 --- a/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md +++ b/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md @@ -71,9 +71,9 @@ replication traffic, service discovery, and operational access. Common approaches include: -- **Istio multi-Kubernetes-cluster mesh:** Use east-west gateways, shared trust, +- **Cross Kubernetes cluster mesh:** Use east-west gateways, shared trust, remote secrets, and mesh service discovery to route replication traffic across - providers. + providers. Tools like Istio can manage this automatically. - **Site-to-site VPNs:** Connect cloud networks with VPN tunnels when your organization needs private IP connectivity but doesn't have one native provider network that spans every Kubernetes cluster. @@ -84,11 +84,13 @@ Common approaches include: only when private connectivity isn't available, and pair them with TLS, firewall restrictions, and tightly scoped access. -The playground uses Istio as the reference approach because it works across AKS, -GKE, and EKS without requiring one provider's private networking model to cover +The playground uses Istio as the reference approach because it works easily across +AKS, GKE, and EKS without requiring one provider's private networking model to cover all member Kubernetes clusters. For implementation details, see the [multi-cloud deployment playground](https://github.com/documentdb/documentdb-kubernetes-operator/tree/main/documentdb-playground/multi-cloud-deployment) and the upstream [Istio multi-primary multi-network documentation](https://istio.io/latest/docs/setup/install/multicluster/multi-primary_multi-network/). +For high throughput production situations, consult with networking professionals +to select the most performant tools for your needs. For operator-level networking configuration, including the `crossCloudNetworkingStrategy` field, see the @@ -109,9 +111,9 @@ configured consistently across providers. ### Identity and permissions are provider-specific -Multi-region deployments within one provider often share one IAM model. In a -multi-cloud deployment, each provider has its own identity system, permission -model, audit trail, and credential refresh behavior. +Multi-region deployments within one provider often share one IAM (Identity and +Access Management) model. In a multi-cloud deployment, each provider has its own +identity system, permission model, audit trail, and credential refresh behavior. Plan for: @@ -148,6 +150,8 @@ DocumentDB cluster health, monitor: - Cross-provider latency and packet loss. - Provider load balancer health. - DNS record propagation and client resolution. +- Basic connectivity between database nodes and the application layer. +- Application gateway selection, to avoid unintended cross-cloud latency. For a playground observability example, see the [multi-cloud telemetry folder](https://github.com/documentdb/documentdb-kubernetes-operator/tree/main/documentdb-playground/multi-cloud-deployment/telemetry).