Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ flowchart LR
NEW_PVC --> NEW_CLUSTER
```

For backup configuration, see [Backup and Restore](../backup-and-restore.md).
For backup configuration, see [Backup and Restore](../operations/backup-and-restore.md).

## Dependencies

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ For available classes, see

- [AKS storage classes](https://learn.microsoft.com/azure/aks/concepts-storage#storage-classes)
- [Azure Disk CSI driver on AKS](https://learn.microsoft.com/azure/aks/azure-disk-csi)
- [DocumentDB storage configuration](../advanced-configuration/README.md#storage-configuration)
- [DocumentDB storage configuration](../configuration/storage.md)

## Monitoring and troubleshooting

Expand Down Expand Up @@ -111,7 +111,7 @@ kubectl get pods -n kube-system | grep csi-azuredisk
enabled
- [Encryption at rest](https://learn.microsoft.com/azure/aks/enable-host-encryption)
on managed disks
- [TLS configuration](../advanced-configuration/README.md#tls-configuration)
- [TLS configuration](../configuration/tls.md)
for database traffic
- [Azure RBAC integration](https://learn.microsoft.com/azure/aks/manage-azure-rbac)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
---
title: Multi-cloud differences
description: Understand what changes when you extend a multi-region DocumentDB deployment across AKS, GKE, and EKS instead of staying within one cloud provider.
tags:
- multi-region
- multi-cloud
- networking
- istio
- disaster-recovery
---

## Overview

A multi-cloud DocumentDB deployment is a multi-region deployment where the
participating Kubernetes clusters run in different cloud providers. The
[primary-replica model](overview.md#primary-replica-model), replication
configuration, and failover behavior are the same as any multi-region
deployment.

Use the multi-region docs as the baseline for topology, setup, and failover:

- [Multi-region deployment overview](overview.md)
- [Multi-region setup guide](setup.md)
- [Multi-region failover procedures](failover-procedures.md)

This page covers only the differences that matter when those Kubernetes clusters
span AKS, GKE, EKS, or another provider mix.

## What changes in multi-cloud deployments

### Provider deployment guides still apply

Start with the single-provider deployment guides when you need provider-specific
Kubernetes prerequisites, storage defaults, authentication setup, and service
exposure behavior. The multi-cloud layer changes how the Kubernetes clusters
connect to each other; it doesn't replace the provider-specific setup work for
each member Kubernetes cluster.

- [Deploy on AKS](../getting-started/deploy-on-aks.md)
- [Deploy on EKS](../getting-started/deploy-on-eks.md)
- [Deploy on GKE](../getting-started/deploy-on-gke.md)

Use the `clusterList[].environment` variables to indicate which Kubernetes clusters
use certain cloud providers for automatic additions from those guides.

```yaml title="documentdb.yaml"
apiVersion: documentdb.io/preview
kind: DocumentDB
metadata:
name: documentdb-preview
namespace: documentdb-preview-ns
spec:
clusterReplication:
primary: azure-cluster
clusterList:
- name: aws-cluster
environment: eks
- name: azure-cluster
environment: aks
- name: gcp-cluster
environment: gke
```

### Networking becomes the primary design decision

In a single cloud provider, you can usually rely on a cohesive private
networking model such as VNet peering, VPC peering, private DNS, and provider
load balancers. In a multi-cloud deployment, those assumptions don't hold across
every Kubernetes cluster. You need an explicit cross-cloud networking design for
replication traffic, service discovery, and operational access.

Common approaches include:

- **Cross Kubernetes cluster mesh:** Use east-west gateways, shared trust,
remote secrets, and mesh service discovery to route replication traffic across
providers. Tools like Istio can manage this automatically.
- **Site-to-site VPNs:** Connect cloud networks with VPN tunnels when your
organization needs private IP connectivity but doesn't have one native
provider network that spans every Kubernetes cluster.
- **Cloud interconnect or private WAN:** Use dedicated connectivity when latency,
bandwidth, or compliance requirements exceed what internet-routed VPN or mesh
gateways can provide.
- **Public load balancers with strict controls:** Use provider load balancers
only when private connectivity isn't available, and pair them with TLS,
firewall restrictions, and tightly scoped access.

Comment thread
alaye-ms marked this conversation as resolved.
The playground uses Istio as the reference approach because it works easily across
AKS, GKE, and EKS without requiring one provider's private networking model to cover
all member Kubernetes clusters. For implementation details, see the
[multi-cloud deployment playground](https://github.com/documentdb/documentdb-kubernetes-operator/tree/main/documentdb-playground/multi-cloud-deployment)
and the upstream [Istio multi-primary multi-network documentation](https://istio.io/latest/docs/setup/install/multicluster/multi-primary_multi-network/).
For high throughput production situations, consult with networking professionals
to select the most performant tools for your needs.

Comment thread
alaye-ms marked this conversation as resolved.
For operator-level networking configuration, including the
`crossCloudNetworkingStrategy` field, see the
[multi-region setup guide](setup.md#networking-management).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question on this section — I might be reading the API wrong, but my understanding is that crossCloudNetworkingStrategy only accepts Istio, AzureFleet, or None. The 4 approaches listed here seem to mix two different things:

  • Istio multi-cluster mesh — maps directly to strategy: Istio
  • Site-to-site VPN / cloud interconnect / public LBs — read more like underlying network transports rather than operator strategies?

If that's right, then for multi-cloud Istio would be the only operator-supported strategy, and the other three would still need strategy: Istio set on top to actually do cross-cluster service discovery — otherwise the operator would fall back to None and not know how to route replication. A reader picking "Site-to-site VPN" alone might end up in that state.

Could you clarify whether the section is meant to list operator strategies, network transports, or both? Happy to be wrong here — want to make sure I understand before suggesting changes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multi-region docs already talk extensively about the operator modes for inter-connectivity, so this section is meant to talk about how to use those or other strategies for the inter-kubernetes cluster networking. For the crossCloudNetworkingStrategy I link here to the multi-region docs, which talks about the operator side. If it makes it clearer, I can fold the Istio point into the public load balancer point, as that's what Istio does.


### Service discovery needs a cross-provider plan

DocumentDB replication depends on
[stable service names and reachable endpoints](overview.md#dns-and-service-discovery).
Within one provider, private DNS and network peering are often enough. Across
providers, decide how each Kubernetes cluster resolves and reaches the services
in the other Kubernetes clusters.

If you use Istio, remote secrets and east-west gateways provide the discovery
and routing path. If you use site-to-site VPNs or private WAN connectivity, make
sure DNS zones, conditional forwarding, firewall rules, and route tables are
configured consistently across providers.

### Identity and permissions are provider-specific

Multi-region deployments within one provider often share one IAM (Identity and
Access Management) model. In a multi-cloud deployment, each provider has its own
identity system, permission model, audit trail, and credential refresh behavior.

Plan for:

- Separate cloud identities for AKS, GKE, and EKS operations.
- Kubernetes RBAC on every member Kubernetes cluster.
- Fleet or GitOps permissions for resource propagation.
- Provider-specific permissions for load balancers, disks, DNS, and network
changes.

The playground README lists the required provider tools and permissions in
[Prerequisites](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#prerequisites).

### DNS and client routing need an external source of truth

After failover, clients must reach the promoted primary DocumentDB cluster. In a
single cloud provider, your DNS and traffic-routing options may be standardized.
Across providers, choose a source of truth that all clients can use, such as a
shared DNS zone, global traffic manager, or application configuration system.

The playground can create Azure DNS records, including MongoDB SRV records, as
one example. See
[Azure DNS configuration](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#azure-dns-configuration)
and [Connect to database](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#connect-to-database).

### Observability must include the network layer

Multi-cloud failures often show up first as networking symptoms: gateway health,
DNS resolution, route changes, packet loss, or certificate trust issues. In
addition to monitoring [replication lag](overview.md#replication-lag) and
DocumentDB cluster health, monitor:

- Istio control plane and east-west gateway health.
- VPN or interconnect tunnel status.
- Cross-provider latency and packet loss.
- Provider load balancer health.
- DNS record propagation and client resolution.
- Basic connectivity between database nodes and the application layer.
- Application gateway selection, to avoid unintended cross-cloud latency.

Comment thread
alaye-ms marked this conversation as resolved.
For a playground observability example, see the
[multi-cloud telemetry folder](https://github.com/documentdb/documentdb-kubernetes-operator/tree/main/documentdb-playground/multi-cloud-deployment/telemetry).

## When to use the playground

Use the multi-cloud playground when you want a runnable AKS, GKE, and EKS
reference deployment. Keep the public docs as the conceptual baseline, and use
the playground for exact commands, environment variables, templates, and cleanup
steps:

- [deploy.sh](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/deploy.sh)
- [deploy-documentdb.sh](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/deploy-documentdb.sh)
- [Multi-cloud failover operations](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#failover-operations)
- [Multi-cloud troubleshooting](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#troubleshooting)

## Next steps

- [Multi-region failover procedures](failover-procedures.md)
- [Multi-cloud playground](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md)
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ Use cloud-native VNet/VPC peering for direct Kubernetes cluster-to-cluster commu
DocumentDB replication requires these ports between Kubernetes clusters:

| Port | Protocol | Purpose |
| | | |
|------|----------|------------------------------------------|
| 5432 | TCP | PostgreSQL streaming replication |
| 443 | TCP | Kubernetes API (for KubeFleet, optional) |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,6 @@ If PVCs aren't provisioning:
## Next steps

- [Failover procedures](failover-procedures.md) - Learn how to perform planned and unplanned failovers
- [Backup and restore](../backup-and-restore.md) - Configure multi-region backup strategies
- [Backup and restore](../operations/backup-and-restore.md) - Configure multi-region backup strategies
- [TLS configuration](../configuration/tls.md) - Secure connections with proper TLS certificates
- [AKS Fleet deployment example](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/aks-fleet-deployment/README.md) - Automated Azure multi-region setup
7 changes: 4 additions & 3 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,11 @@ nav:
- Monitoring:
- Overview: preview/monitoring/overview.md
- Metrics Reference: preview/monitoring/metrics.md
- Multi-Region Deployment:
- Multi-region deployment:
- Overview: preview/multi-region-deployment/overview.md
- Setup Guide: preview/multi-region-deployment/setup.md
- Failover Procedures: preview/multi-region-deployment/failover-procedures.md
- Setup guide: preview/multi-region-deployment/setup.md
- Failover procedures: preview/multi-region-deployment/failover-procedures.md
- Multi-cloud differences: preview/multi-region-deployment/multi-cloud.md
- FAQ: preview/faq.md
- Tools:
- Kubectl Plugin: preview/kubectl-plugin.md
Expand Down