|
| 1 | +# External Resource Classes Configuration |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Nx Cloud supports configuring resource classes for agents through an external ConfigMap. This feature allows you to define different resource classes for your agents without having to rebuild the Nx Cloud image or modify the Helm chart values directly. |
| 6 | + |
| 7 | +## Requirements |
| 8 | + |
| 9 | +This feature is only available with the following versions: |
| 10 | + |
| 11 | +- nx-cloud chart version >= 0.15.15 |
| 12 | +- nx-agents chart version >= 1.2.11 |
| 13 | +- Nx Cloud image versions 2504.01 and newer |
| 14 | + |
| 15 | +## Configuring the Application |
| 16 | + |
| 17 | +### Recommended Approach: Using extraManifests |
| 18 | + |
| 19 | +The simplest way to configure resource classes is to define the ConfigMap directly in your Helm values file using the `extraManifests` feature. This allows you to deploy the ConfigMap alongside the chart without having to create it separately: |
| 20 | + |
| 21 | +```yaml |
| 22 | +extraManifests: |
| 23 | + resource-class-config: |
| 24 | + apiVersion: v1 |
| 25 | + kind: ConfigMap |
| 26 | + metadata: |
| 27 | + name: resource-class-config |
| 28 | + data: |
| 29 | + agentConfigs.yaml: | |
| 30 | + resourceClasses: |
| 31 | + - identifier: my-custom-resource-class-small |
| 32 | + architecture: amd64 |
| 33 | + - identifier: my-custom-resource-class-medium |
| 34 | + architecture: amd64 |
| 35 | + - identifier: my-custom-resource-class-large |
| 36 | + architecture: amd64 |
| 37 | +
|
| 38 | +resourceClassConfiguration: |
| 39 | + name: "resource-class-config" # Name of the configmap containing resource class configuration |
| 40 | + path: "agentConfigs.yaml" # Path to the specific key within the configmap that contains the configuration |
| 41 | +``` |
| 42 | +
|
| 43 | +With this approach, you can define, deploy, and reference the ConfigMap all in one Helm values file, making it easier to manage your configuration. |
| 44 | +
|
| 45 | +**Important Notes:** |
| 46 | +- Both `name` and `path` properties are required if this feature is used |
| 47 | +- The `path` value should match the key in your ConfigMap that contains the resource class configuration |
| 48 | + |
| 49 | +### Alternative Approach: Creating a ConfigMap Separately |
| 50 | + |
| 51 | +Alternatively, you can create the ConfigMap separately and then reference it in your Helm values: |
| 52 | + |
| 53 | +1. Create a ConfigMap containing your resource class configuration. The configuration should be in YAML format and follow the structure shown in the example below: |
| 54 | + |
| 55 | +```yaml |
| 56 | +resourceClasses: |
| 57 | + - identifier: my-custom-resource-class-small |
| 58 | + architecture: amd64 |
| 59 | +
|
| 60 | + # Add more resource classes as needed |
| 61 | +
|
| 62 | +``` |
| 63 | + |
| 64 | +2. Create the ConfigMap using kubectl: |
| 65 | + |
| 66 | +```bash |
| 67 | +kubectl create configmap resource-class-config --from-file=agentConfigs.yaml=/path/to/your/config.yaml -n your-namespace |
| 68 | +``` |
| 69 | + |
| 70 | +3. Update your Nx Cloud Helm values to reference the ConfigMap: |
| 71 | + |
| 72 | +```yaml |
| 73 | +resourceClassConfiguration: |
| 74 | + name: "resource-class-config" # Name of the configmap containing resource class configuration |
| 75 | + path: "agentConfigs.yaml" # Path to the specific key within the configmap that contains the configuration |
| 76 | +``` |
| 77 | + |
| 78 | +## How It Works |
| 79 | + |
| 80 | +When configured, the Nx Cloud Helm chart will: |
| 81 | + |
| 82 | +1. Mount the specified ConfigMap to the Nx Cloud API and Aggregator pods |
| 83 | +2. Set the environment variable `NX_CLOUD_RESOURCE_CLASS_FILEPATH` to point to the mounted configuration file |
| 84 | +3. The Nx Cloud services will read this configuration and use it to determine the resource requirements for agent pods |
| 85 | + |
| 86 | +## Deployment |
| 87 | + |
| 88 | +Once you've configured your resource classes using one of the approaches above, deploy or upgrade your Nx Cloud Helm release: |
| 89 | + |
| 90 | +```bash |
| 91 | +helm upgrade nx-cloud nx-cloud/nx-cloud -f values.yaml -n nx-cloud |
| 92 | +``` |
| 93 | + |
| 94 | +## Configuring the Workflow Controller |
| 95 | + |
| 96 | +When using custom resource classes, the Workflow Controller must be provided a list of resource classes where the identifiers match the ones provided to the application chart. The configuration follows a similar pattern but is applied to the workflow controller's values file. |
| 97 | + |
| 98 | +### Using extraManifests in the Workflow Controller |
| 99 | + |
| 100 | +```yaml |
| 101 | +extraManifests: |
| 102 | + resourceclasses: |
| 103 | + apiVersion: v1 |
| 104 | + kind: ConfigMap |
| 105 | + metadata: |
| 106 | + name: agent-configuration |
| 107 | + data: |
| 108 | + agentConfigs.yaml: | |
| 109 | + resourceClasses: |
| 110 | + - platform: docker |
| 111 | + architecture: amd64 |
| 112 | + os: linux |
| 113 | + identifier: my-custom-resource-class-small |
| 114 | + size: small |
| 115 | + cpu: "1" |
| 116 | + memory: "2Gi" |
| 117 | + memoryLimit: "3.5Gi" |
| 118 | + cpuLimit: "1.5" |
| 119 | + - platform: docker |
| 120 | + architecture: amd64 |
| 121 | + os: linux |
| 122 | + identifier: my-custom-resource-class-medium |
| 123 | + size: medium |
| 124 | + cpu: "2" |
| 125 | + memory: "4Gi" |
| 126 | + memoryLimit: "5.5Gi" |
| 127 | + cpuLimit: "3" |
| 128 | + - platform: docker |
| 129 | + architecture: amd64 |
| 130 | + os: linux |
| 131 | + identifier: my-custom-resource-class-large |
| 132 | + size: large |
| 133 | + cpu: "3" |
| 134 | + memory: "8Gi" |
| 135 | + memoryLimit: "9.5Gi" |
| 136 | + cpuLimit: "6" |
| 137 | + # You can also configure agent affinities if needed |
| 138 | + agentAffinities: |
| 139 | + - affinity: |
| 140 | + nodeAffinity: |
| 141 | + requiredDuringSchedulingIgnoredDuringExecution: |
| 142 | + nodeSelectorTerms: |
| 143 | + - matchExpressions: |
| 144 | + - key: nx.app/node-role |
| 145 | + operator: In |
| 146 | + values: |
| 147 | + - agent |
| 148 | + targetClasses: |
| 149 | + - my-custom-resource-class-small |
| 150 | + - my-custom-resource-class-medium |
| 151 | + - affinity: |
| 152 | + nodeAffinity: |
| 153 | + requiredDuringSchedulingIgnoredDuringExecution: |
| 154 | + nodeSelectorTerms: |
| 155 | + - matchExpressions: |
| 156 | + - key: cloud.google.com/gke-nodepool |
| 157 | + operator: In |
| 158 | + values: |
| 159 | + - high-performance-pool |
| 160 | + targetClasses: |
| 161 | + - my-custom-resource-class-large |
| 162 | +``` |
| 163 | + |
| 164 | +### Configuring the Workflow Controller Deployment |
| 165 | + |
| 166 | +In addition to creating the ConfigMap with resource classes, you need to configure the workflow controller deployment to mount the ConfigMap and set the appropriate environment variable. Add the following to your workflow controller values file: |
| 167 | + |
| 168 | +```yaml |
| 169 | +controller: |
| 170 | + deployment: |
| 171 | + # Other deployment configuration... |
| 172 | + env: |
| 173 | + # Other environment variables... |
| 174 | + - name: K8S_RESOURCECLASS_CONFIG_PATH |
| 175 | + value: "/opt/nx-cloud/resource-classes/agentConfigs.yaml" |
| 176 | + volumes: |
| 177 | + - name: resource-classes-config |
| 178 | + configMap: |
| 179 | + name: agent-configuration # Must match the name in your extraManifests |
| 180 | + items: |
| 181 | + - key: agentConfigs.yaml # Must match the key in your ConfigMap |
| 182 | + path: agentConfigs.yaml |
| 183 | + volumeMounts: |
| 184 | + - name: resource-classes-config |
| 185 | + mountPath: /opt/nx-cloud/resource-classes |
| 186 | +``` |
| 187 | + |
| 188 | +This configuration: |
| 189 | + |
| 190 | +1. Creates a volume named `resource-classes-config` that references the ConfigMap containing your resource class definitions |
| 191 | +2. Mounts this volume to the path `/opt/nx-cloud/resource-classes` in the workflow controller container |
| 192 | +3. Sets the environment variable `K8S_RESOURCECLASS_CONFIG_PATH` to point to the specific file within the mounted volume |
| 193 | + |
| 194 | +The workflow controller will read this configuration file to determine the resource requirements for agent pods and apply the appropriate affinities when scheduling them. |
| 195 | + |
| 196 | +### Resource Class Properties |
| 197 | + |
| 198 | +#### Required Fields for API and Aggregator |
| 199 | + |
| 200 | +For the Nx Cloud API and Aggregator components, only the following fields are required: |
| 201 | + |
| 202 | +- **identifier**: Unique identifier for the resource class |
| 203 | +- **architecture**: CPU architecture (e.g., amd64, arm64) |
| 204 | + |
| 205 | +#### Additional Fields for Workflow Controller |
| 206 | + |
| 207 | +The following fields are required by the workflow controller in the nx-cloud-workflow-controller chart: |
| 208 | + |
| 209 | +- **identifier**: Unique identifier for the resource class |
| 210 | +- **architecture**: CPU architecture (e.g., amd64, arm64) |
| 211 | +- **platform**: The container platform (e.g., docker) |
| 212 | +- **os**: Operating system (e.g., linux) |
| 213 | +- **size**: Human-readable size name |
| 214 | +- **cpu**: Requested CPU resources |
| 215 | +- **memory**: Requested memory resources |
| 216 | +- **cpuLimit**: CPU limit |
| 217 | +- **memoryLimit**: Memory limit |
| 218 | + |
| 219 | +### Agent Affinities |
| 220 | + |
| 221 | +The `agentAffinities` section allows you to control which resource classes are scheduled on specific nodes based on node labels or other Kubernetes affinity rules. Each affinity consists of two parts: |
| 222 | + |
| 223 | +1. **affinity**: A standard Kubernetes affinity definition that determines which nodes are eligible for scheduling. This is a direct 1:1 mapping with Kubernetes affinity rules and supports all the same properties (requiredDuringSchedulingIgnoredDuringExecution, preferredDuringSchedulingIgnoredDuringExecution, etc.). |
| 224 | + |
| 225 | +2. **targetClasses**: A list of resource class identifiers that should be scheduled according to this affinity rule. These identifiers must match resource classes defined in the `resourceClasses` section. |
| 226 | + |
| 227 | +Example: |
| 228 | +```yaml |
| 229 | +agentAffinities: |
| 230 | + - affinity: |
| 231 | + nodeAffinity: |
| 232 | + requiredDuringSchedulingIgnoredDuringExecution: |
| 233 | + nodeSelectorTerms: |
| 234 | + - matchExpressions: |
| 235 | + - key: nx.app/node-role |
| 236 | + operator: In |
| 237 | + values: |
| 238 | + - agent |
| 239 | + targetClasses: |
| 240 | + - my-custom-resource-class-small |
| 241 | + - my-custom-resource-class-medium |
| 242 | +``` |
| 243 | + |
| 244 | +In this example, the `my-custom-resource-class-small` and `my-custom-resource-class-medium` resource classes will only be scheduled on nodes with the label `nx.app/node-role: agent`. |
| 245 | + |
| 246 | +## Troubleshooting |
| 247 | + |
| 248 | +If you encounter issues with the resource class configuration: |
| 249 | + |
| 250 | +1. Verify that both the `name` and `path` properties are correctly set in your Helm values |
| 251 | +2. Check that the ConfigMap exists in the correct namespace |
| 252 | +3. Ensure the key specified in the `path` property exists in the ConfigMap |
| 253 | +4. Validate that your resource class configuration follows the correct format |
| 254 | +5. Check the logs of the Nx Cloud API and Aggregator pods for any error messages related to resource class configuration |
| 255 | + |
0 commit comments