diff --git a/docs/operator-public-documentation/preview/architecture/overview.md b/docs/operator-public-documentation/preview/architecture/overview.md index 2c0e16a23..d8f037741 100644 --- a/docs/operator-public-documentation/preview/architecture/overview.md +++ b/docs/operator-public-documentation/preview/architecture/overview.md @@ -309,7 +309,7 @@ flowchart LR NEW_PVC --> NEW_CLUSTER ``` -For backup configuration, see [Backup and Restore](../backup-and-restore.md). +For backup configuration, see [Backup and Restore](../operations/backup-and-restore.md). ## Dependencies diff --git a/docs/operator-public-documentation/preview/getting-started/deploy-on-aks.md b/docs/operator-public-documentation/preview/getting-started/deploy-on-aks.md index 8a872280d..d843e361f 100644 --- a/docs/operator-public-documentation/preview/getting-started/deploy-on-aks.md +++ b/docs/operator-public-documentation/preview/getting-started/deploy-on-aks.md @@ -72,7 +72,7 @@ For available classes, see - [AKS storage classes](https://learn.microsoft.com/azure/aks/concepts-storage#storage-classes) - [Azure Disk CSI driver on AKS](https://learn.microsoft.com/azure/aks/azure-disk-csi) -- [DocumentDB storage configuration](../advanced-configuration/README.md#storage-configuration) +- [DocumentDB storage configuration](../configuration/storage.md) ## Monitoring and troubleshooting @@ -111,7 +111,7 @@ kubectl get pods -n kube-system | grep csi-azuredisk enabled - [Encryption at rest](https://learn.microsoft.com/azure/aks/enable-host-encryption) on managed disks -- [TLS configuration](../advanced-configuration/README.md#tls-configuration) +- [TLS configuration](../configuration/tls.md) for database traffic - [Azure RBAC integration](https://learn.microsoft.com/azure/aks/manage-azure-rbac) diff --git a/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md b/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md new file mode 100644 index 000000000..6baf64a6f --- /dev/null +++ b/docs/operator-public-documentation/preview/multi-region-deployment/multi-cloud.md @@ -0,0 +1,174 @@ +--- +title: Multi-cloud differences +description: Understand what changes when you extend a multi-region DocumentDB deployment across AKS, GKE, and EKS instead of staying within one cloud provider. +tags: + - multi-region + - multi-cloud + - networking + - istio + - disaster-recovery +--- + +## Overview + +A multi-cloud DocumentDB deployment is a multi-region deployment where the +participating Kubernetes clusters run in different cloud providers. The +[primary-replica model](overview.md#primary-replica-model), replication +configuration, and failover behavior are the same as any multi-region +deployment. + +Use the multi-region docs as the baseline for topology, setup, and failover: + +- [Multi-region deployment overview](overview.md) +- [Multi-region setup guide](setup.md) +- [Multi-region failover procedures](failover-procedures.md) + +This page covers only the differences that matter when those Kubernetes clusters +span AKS, GKE, EKS, or another provider mix. + +## What changes in multi-cloud deployments + +### Provider deployment guides still apply + +Start with the single-provider deployment guides when you need provider-specific +Kubernetes prerequisites, storage defaults, authentication setup, and service +exposure behavior. The multi-cloud layer changes how the Kubernetes clusters +connect to each other; it doesn't replace the provider-specific setup work for +each member Kubernetes cluster. + +- [Deploy on AKS](../getting-started/deploy-on-aks.md) +- [Deploy on EKS](../getting-started/deploy-on-eks.md) +- [Deploy on GKE](../getting-started/deploy-on-gke.md) + +Use the `clusterList[].environment` variables to indicate which Kubernetes clusters +use certain cloud providers for automatic additions from those guides. + +```yaml title="documentdb.yaml" +apiVersion: documentdb.io/preview +kind: DocumentDB +metadata: + name: documentdb-preview + namespace: documentdb-preview-ns +spec: + clusterReplication: + primary: azure-cluster + clusterList: + - name: aws-cluster + environment: eks + - name: azure-cluster + environment: aks + - name: gcp-cluster + environment: gke +``` + +### Networking becomes the primary design decision + +In a single cloud provider, you can usually rely on a cohesive private +networking model such as VNet peering, VPC peering, private DNS, and provider +load balancers. In a multi-cloud deployment, those assumptions don't hold across +every Kubernetes cluster. You need an explicit cross-cloud networking design for +replication traffic, service discovery, and operational access. + +Common approaches include: + +- **Cross Kubernetes cluster mesh:** Use east-west gateways, shared trust, + remote secrets, and mesh service discovery to route replication traffic across + providers. Tools like Istio can manage this automatically. +- **Site-to-site VPNs:** Connect cloud networks with VPN tunnels when your + organization needs private IP connectivity but doesn't have one native + provider network that spans every Kubernetes cluster. +- **Cloud interconnect or private WAN:** Use dedicated connectivity when latency, + bandwidth, or compliance requirements exceed what internet-routed VPN or mesh + gateways can provide. +- **Public load balancers with strict controls:** Use provider load balancers + only when private connectivity isn't available, and pair them with TLS, + firewall restrictions, and tightly scoped access. + +The playground uses Istio as the reference approach because it works easily across +AKS, GKE, and EKS without requiring one provider's private networking model to cover +all member Kubernetes clusters. For implementation details, see the +[multi-cloud deployment playground](https://github.com/documentdb/documentdb-kubernetes-operator/tree/main/documentdb-playground/multi-cloud-deployment) +and the upstream [Istio multi-primary multi-network documentation](https://istio.io/latest/docs/setup/install/multicluster/multi-primary_multi-network/). +For high throughput production situations, consult with networking professionals +to select the most performant tools for your needs. + +For operator-level networking configuration, including the +`crossCloudNetworkingStrategy` field, see the +[multi-region setup guide](setup.md#networking-management). + +### Service discovery needs a cross-provider plan + +DocumentDB replication depends on +[stable service names and reachable endpoints](overview.md#dns-and-service-discovery). +Within one provider, private DNS and network peering are often enough. Across +providers, decide how each Kubernetes cluster resolves and reaches the services +in the other Kubernetes clusters. + +If you use Istio, remote secrets and east-west gateways provide the discovery +and routing path. If you use site-to-site VPNs or private WAN connectivity, make +sure DNS zones, conditional forwarding, firewall rules, and route tables are +configured consistently across providers. + +### Identity and permissions are provider-specific + +Multi-region deployments within one provider often share one IAM (Identity and +Access Management) model. In a multi-cloud deployment, each provider has its own +identity system, permission model, audit trail, and credential refresh behavior. + +Plan for: + +- Separate cloud identities for AKS, GKE, and EKS operations. +- Kubernetes RBAC on every member Kubernetes cluster. +- Fleet or GitOps permissions for resource propagation. +- Provider-specific permissions for load balancers, disks, DNS, and network + changes. + +The playground README lists the required provider tools and permissions in +[Prerequisites](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#prerequisites). + +### DNS and client routing need an external source of truth + +After failover, clients must reach the promoted primary DocumentDB cluster. In a +single cloud provider, your DNS and traffic-routing options may be standardized. +Across providers, choose a source of truth that all clients can use, such as a +shared DNS zone, global traffic manager, or application configuration system. + +The playground can create Azure DNS records, including MongoDB SRV records, as +one example. See +[Azure DNS configuration](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#azure-dns-configuration) +and [Connect to database](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#connect-to-database). + +### Observability must include the network layer + +Multi-cloud failures often show up first as networking symptoms: gateway health, +DNS resolution, route changes, packet loss, or certificate trust issues. In +addition to monitoring [replication lag](overview.md#replication-lag) and +DocumentDB cluster health, monitor: + +- Istio control plane and east-west gateway health. +- VPN or interconnect tunnel status. +- Cross-provider latency and packet loss. +- Provider load balancer health. +- DNS record propagation and client resolution. +- Basic connectivity between database nodes and the application layer. +- Application gateway selection, to avoid unintended cross-cloud latency. + +For a playground observability example, see the +[multi-cloud telemetry folder](https://github.com/documentdb/documentdb-kubernetes-operator/tree/main/documentdb-playground/multi-cloud-deployment/telemetry). + +## When to use the playground + +Use the multi-cloud playground when you want a runnable AKS, GKE, and EKS +reference deployment. Keep the public docs as the conceptual baseline, and use +the playground for exact commands, environment variables, templates, and cleanup +steps: + +- [deploy.sh](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/deploy.sh) +- [deploy-documentdb.sh](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/deploy-documentdb.sh) +- [Multi-cloud failover operations](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#failover-operations) +- [Multi-cloud troubleshooting](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md#troubleshooting) + +## Next steps + +- [Multi-region failover procedures](failover-procedures.md) +- [Multi-cloud playground](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/multi-cloud-deployment/README.md) diff --git a/docs/operator-public-documentation/preview/multi-region-deployment/overview.md b/docs/operator-public-documentation/preview/multi-region-deployment/overview.md index 341b3e7df..fe61eff19 100644 --- a/docs/operator-public-documentation/preview/multi-region-deployment/overview.md +++ b/docs/operator-public-documentation/preview/multi-region-deployment/overview.md @@ -94,7 +94,7 @@ Use cloud-native VNet/VPC peering for direct Kubernetes cluster-to-cluster commu DocumentDB replication requires these ports between Kubernetes clusters: | Port | Protocol | Purpose | -| | | | +|------|----------|------------------------------------------| | 5432 | TCP | PostgreSQL streaming replication | | 443 | TCP | Kubernetes API (for KubeFleet, optional) | diff --git a/docs/operator-public-documentation/preview/multi-region-deployment/setup.md b/docs/operator-public-documentation/preview/multi-region-deployment/setup.md index 5754ac235..bbf0fe298 100644 --- a/docs/operator-public-documentation/preview/multi-region-deployment/setup.md +++ b/docs/operator-public-documentation/preview/multi-region-deployment/setup.md @@ -402,6 +402,6 @@ If PVCs aren't provisioning: ## Next steps - [Failover procedures](failover-procedures.md) - Learn how to perform planned and unplanned failovers -- [Backup and restore](../backup-and-restore.md) - Configure multi-region backup strategies +- [Backup and restore](../operations/backup-and-restore.md) - Configure multi-region backup strategies - [TLS configuration](../configuration/tls.md) - Secure connections with proper TLS certificates - [AKS Fleet deployment example](https://github.com/documentdb/documentdb-kubernetes-operator/blob/main/documentdb-playground/aks-fleet-deployment/README.md) - Automated Azure multi-region setup diff --git a/mkdocs.yml b/mkdocs.yml index 40bbb14e3..838fef48f 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -59,10 +59,11 @@ nav: - Monitoring: - Overview: preview/monitoring/overview.md - Metrics Reference: preview/monitoring/metrics.md - - Multi-Region Deployment: + - Multi-region deployment: - Overview: preview/multi-region-deployment/overview.md - - Setup Guide: preview/multi-region-deployment/setup.md - - Failover Procedures: preview/multi-region-deployment/failover-procedures.md + - Setup guide: preview/multi-region-deployment/setup.md + - Failover procedures: preview/multi-region-deployment/failover-procedures.md + - Multi-cloud differences: preview/multi-region-deployment/multi-cloud.md - FAQ: preview/faq.md - Tools: - Kubectl Plugin: preview/kubectl-plugin.md