Skip to content

Commit ffcf37d

Browse files
committed
CH-189 docs for cluster configuration
1 parent a7d4354 commit ffcf37d

File tree

9 files changed

+266
-6
lines changed

9 files changed

+266
-6
lines changed
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Cluster configuration
2+
3+
Cloud Harness deploys with helm and uses an ingress controller to serve application endpoints.
4+
5+
The ingress controller is not installed as a part of the application, and has to be installed separately. The imstallation process can be different depending on the Kubernetes provider.
6+
7+
See [here](../../infrastructure/cluster-configuration/README.md) for more information and resources for setting up your cluster.
Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,36 @@
11
# Cluster configuration
22

3-
1. Initialize kubectl credentials to work with your cluster (e.g minikube or google cloud)
4-
1. Run `source cluster-init.sh`
3+
## Simple setup on an existing cluster
4+
5+
### TLDR;
6+
1. Create Kubernetes cluster (e.g minikube or google cloud)
7+
1. Initialize kubectl credentials to work with your cluster
8+
1. Run `source cluster-init.sh`
9+
10+
### Cert-manager
11+
12+
Follow [this](https://cert-manager.io/docs/installation/kubernetes/) instructions to deploy cert-manager
13+
14+
### Ingress
15+
16+
Ingress controller is the entry point of all cloudharness applications.
17+
Info on how to deploy nginx-ingress can be found [here](https://kubernetes.github.io/ingress-nginx/deploy/).
18+
19+
```
20+
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
21+
helm repo update
22+
helm install ingress-nginx ingress-nginx/ingress-nginx
23+
```
24+
25+
On localclusters and GCP/GKE, the nginx-ingress chart will deploy a Load Balancer with a given IP address, while . Use that address to create the CNames and A records for the website.
26+
27+
28+
29+
## GCP GKE cluster setup
30+
31+
GKE setup is pretty straighforward. Can create a cluster and a node pool from the google console and internet facing load balancers are directly created with the ingress controller.
32+
33+
For additional info see [here](gcp-setup.md).
34+
35+
## AWS EKS setup
36+
AWS requires come additional steps to install the load balancer and the ingress, see [here](./aws-setup.md)
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
2+
3+
4+
## Create the EKS cluster
5+
6+
There are many ways to create a cluster and which specific one to use depends on specifications that are outside of the generic scope.
7+
8+
This is a good starting point: https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html.
9+
In doubt, [Auto Mode Cluster](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-automode.html) is a good place to start.
10+
11+
## Ingress setup
12+
13+
The following is inspired by https://aws.amazon.com/blogs/containers/exposing-kubernetes-applications-part-3-nginx-ingress-controller/, section "Exposing Ingress-Nginx Controller via a Load Balancer".
14+
Be aware that the article is from 2022 and it doesn't work 100%.
15+
Following the steps that worked for us on May 2025
16+
17+
### Setup the policy and service account
18+
19+
Note that have to pay attention to the version of the aws-load-balancer-controller to match with the policy. Wrong version will make things fail
20+
21+
```bash
22+
curl -o iam-policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/main/docs/install/iam_policy.json
23+
aws iam create-policy --policy-name AWSLoadBalancerControllerIAMPolicy --policy-document file://iam-policy.json
24+
AWS_ACCOUNT=527966638683
25+
eksctl create iamserviceaccount \
26+
--cluster=metacell-dev \
27+
--name=aws-load-balancer-controller \
28+
--namespace=kube-system \
29+
--attach-policy-arn=arn:aws:iam::${AWS_ACCOUNT}:policy/AWSLoadBalancerControllerIAMPolicy \
30+
--approve
31+
```
32+
33+
### Install the aws-load-balancer-controller
34+
35+
First, apply custom resource definition
36+
```bash
37+
wget https://raw.githubusercontent.com/aws/eks-charts/refs/heads/master/stable/aws-load-balancer-controller/crds/crds.yaml
38+
kubectl apply -f crds.yaml
39+
```
40+
41+
Then install the helm chart
42+
From https://github.com/aws/eks-charts/tree/master/stable/aws-load-balancer-controller
43+
```bash
44+
helm repo add eks https://aws.github.io/eks-charts
45+
# If using IAM Roles for service account install as follows - NOTE: you need to specify both of the chart values `serviceAccount.create=false` and `serviceAccount.name=aws-load-balancer-controller`
46+
helm install aws-load-balancer-controller eks/aws-load-balancer-controller --set clusterName=metacell-dev -n kube-system --set serviceAccount.create=false --set serviceAccount.name=aws-load-balancer-controller
47+
```
48+
49+
50+
### Fix vpc
51+
52+
If encounter the following error related to vpc
53+
54+
> {"level":"info","ts":"2025-05-21T13:53:48Z","msg":"version","GitVersion":"v2.13.2","GitCommit":"4236bd7928711874ae4d8aff6b97870b5625140f","BuildDate":"2025-05-15T17:37:55+0000"}
55+
> {"level":"error","ts":"2025-05-21T13:53:53Z","logger":"setup","msg":"unable to initialize AWS cloud","error":"failed to get VPC ID: failed to fetch VPC ID from instance metadata: error in fetching vpc id through ec2 metadata: get mac metadata: operation error ec2imds: GetMetadata, canceled, context deadline exceeded"}
56+
57+
First get the vpc id:
58+
59+
```bash
60+
aws eks describe-cluster \
61+
--name metacell-dev \
62+
--region us-west-2 \
63+
--query "cluster.resourcesVpcConfig.vpcId" \
64+
--output text
65+
```
66+
67+
Then fix the vpc id value
68+
```bash
69+
helm upgrade aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --reuse-values --set vpcId=$VPC_ID
70+
```
71+
72+
### Install ingress nginx
73+
74+
```bash
75+
helm upgrade -i ingress-nginx ingress-nginx/ingress-nginx \
76+
--namespace kube-system \
77+
--values ingress/values-aws.yaml
78+
79+
kubectl -n kube-system rollout status deployment ingress-nginx-controller
80+
81+
kubectl get deployment -n kube-system ingress-nginx-controller
82+
```
83+
84+
### Associate the DNS
85+
86+
The endpoint can be assigned with 2 CNAME entries.
87+
For instance, if you run `harness-deployment ... -d myapp.mydomain.com`,
88+
the following CNAME entries are needed
89+
- myapp [LB_ADDRESS]
90+
- *.myapp [LB_ADDRESS]
91+
92+
93+
The easiest way to get the load balancer addressis to do the deployment and
94+
from the ingress with
95+
96+
```
97+
kubectl get ingress
98+
```
99+
100+
## Storage class
101+
102+
EKS does not provide a default storage class.
103+
To create one, run
104+
105+
```bash
106+
kubectl apply -f storageclass-default-aws.yaml
107+
```
108+
109+
## Container registry
110+
111+
CloudHarness pushes images on a container registry, which has to be readable from EKS
112+
113+
Any public registry can be used seamlessly, while ECR is recommended to pull private images
114+
115+
1. Create a new ECR registry
116+
2. Create all the repositories within the deployment (ECR does not create repositories automatically on push, unless this is implemented https://aws.amazon.com/blogs/containers/dynamically-create-repositories-upon-image-push-to-amazon-ecr/)
117+
3. Give the permissions to the Node IAM role
118+
https://docs.aws.amazon.com/AmazonECR/latest/userguide/ECR_on_EKS.html (the role should be AmazonSSMRoleForInstancesQuickSetup for Auto Mode Clusters)
119+
120+
To push images, have to authenticate to the registry.
121+
122+
To authenticate from the local console, the command looks like the following:
123+
124+
```bash
125+
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 527966638683.dkr.ecr.us-west-2.amazonaws.com
126+
```
127+
128+
The exact command can also be viewed by hitting "View push commands" from the web console.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Azure AKS setup
2+
3+
The main complication within AKS is given by the fact that the nginx ingress controller
4+
is not easy to setup, hence the AGIC controller has to be preferred.
5+
6+
7+
1. Create new public IP for the AGIC load balancer
8+
2. Create the application gateway
9+
3. Enable AGIC add-on in existing AKS cluster with Azure CLI
10+
4. Peer the AKS and AG virtual networks together
11+
12+
See https://learn.microsoft.com/en-us/azure/application-gateway/tutorial-ingress-controller-add-on-existing.
13+
14+
## Adapt / override Ingress template
15+
In addition to this, the [Ingress template](../../deployment-configuration/helm/templates/ingress.yaml) has to be adapted to use the agic controller.
16+
17+
An adaptation to the Ingress controller template that optionally supports AGIC is the following
18+
```yaml
19+
kind: Ingress
20+
metadata:
21+
name: {{ .Values.ingress.name | quote }}
22+
annotations:
23+
{{- if eq .Values.ingress.className "azure-application-gateway" }}
24+
appgw.ingress.kubernetes.io/backend-path-prefix: /
25+
{{- if $tls }}
26+
appgw.ingress.kubernetes.io/appgw-ssl-certificate: "nwwcssl"
27+
{{- end }}
28+
appgw.ingress.kubernetes.io/ssl-redirect: {{ (and $tls .Values.ingress.ssl_redirect) | quote }}
29+
{{- else }}
30+
nginx.ingress.kubernetes.io/ssl-redirect: {{ (and $tls .Values.ingress.ssl_redirect) | quote }}
31+
nginx.ingress.kubernetes.io/proxy-body-size: '{{ .Values.proxy.payload.max }}m'
32+
nginx.ingress.kubernetes.io/proxy-buffer-size: '128k'
33+
nginx.ingress.kubernetes.io/from-to-www-redirect: 'true'
34+
nginx.ingress.kubernetes.io/rewrite-target: /$1
35+
nginx.ingress.kubernetes.io/auth-keepalive-timeout: {{ .Values.proxy.timeout.keepalive | quote }}
36+
nginx.ingress.kubernetes.io/proxy-read-timeout: {{ .Values.proxy.timeout.read | quote }}
37+
nginx.ingress.kubernetes.io/proxy-send-timeout: {{ .Values.proxy.timeout.send | quote }}
38+
nginx.ingress.kubernetes.io/use-forwarded-headers: {{ .Values.proxy.forwardedHeaders | quote }}
39+
40+
{{- end }}
41+
{{- if and (and (not .Values.local) (not .Values.certs)) $tls }}
42+
kubernetes.io/tls-acme: 'true'
43+
cert-manager.io/issuer: {{ printf "%s-%s" "letsencrypt" .Values.namespace }}
44+
{{- end }}
45+
```
Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,14 @@
11
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
22
helm repo update
3-
helm install ingress ingress-nginx/ingress-nginx -f ingress/values.yaml -n kube-system
3+
helm install ingress ingress-nginx/ingress-nginx -f ingress/values.yaml -n kube-system
4+
5+
helm repo add jetstack https://charts.jetstack.io --force-update
6+
helm install \
7+
cert-manager jetstack/cert-manager \
8+
--namespace cert-manager \
9+
--create-namespace \
10+
--version v1.17.2 \
11+
--set crds.enabled=true
12+
13+
14+
helm install --name cert-manager --namespace cert-manager --version v0.14.0 jetstack/cert-manager --set webhook.enabled=false
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
## Google GKE setup
2+
3+
The easiest way to create the cluster is to use the GCP web console. Define a node pool that satisfies application requirements (no other particular requirements are generally necessary for the node pool/s).
4+
The node pool can be always scaled to have a different node number, so the only important choice is the node type.
5+
Also other node pools can be added at any point of time to the cluster to replace the original one.
6+
7+
After creating the cluster, use `gcloud` command line client to get the credentials in your machine:
8+
- `gcloud init`
9+
- `gcloud container clusters get-credentials --zone us-central1-a <CLUSTER_NAME>`
10+
11+
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
controller:
2+
service:
3+
annotations:
4+
service.beta.kubernetes.io/aws-load-balancer-name: apps-ingress
5+
service.beta.kubernetes.io/aws-load-balancer-type: external
6+
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
7+
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
8+
service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: http
9+
service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: /healthz
10+
service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: 10254
11+
config:
12+
http-snippet: |
13+
proxy_cache_path /tmp/nginx-cache levels=1:2 keys_zone=static-cache:2m max_size=100m inactive=7d use_temp_path=off;
14+
proxy_cache_key $scheme$proxy_host$request_uri;
15+
proxy_cache_lock on;
16+
proxy_cache_use_stale updating;
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
apiVersion: storage.k8s.io/v1
2+
kind: StorageClass
3+
metadata:
4+
name: default
5+
annotations:
6+
storageclass.kubernetes.io/is-default-class: "true"
7+
reclaimPolicy: Delete
8+
provisioner: ebs.csi.eks.amazonaws.com
9+
volumeBindingMode: Immediate
10+
---
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
apiVersion: storage.k8s.io/v1
22
kind: StorageClass
33
metadata:
4-
name: persisted
5-
reclaimPolicy: Retain
6-
volumeProvisioner: kubernetes.io/gce-pd
4+
name: standard
5+
reclaimPolicy: Delete
6+
provisioner: kubernetes.io/gce-pd
77
volumeBindingMode: Immediate
88
---

0 commit comments

Comments
 (0)