Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions TERMINOLOGY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# tracebloc terminology

The single source of truth for the words we use — in the product UI, docs, website, decks, and support. One concept, one word. The goal: every person and every page names things the same way.

> **Status: seed.** Lukas owns this file. Add terms as they come up; we run it against the team so we stay consistent. The **open questions** at the bottom need a decision before we hard-enforce them.

## How to use it

- Use the **Preferred** term. Avoid everything in **Don't use**.
- One concept = one word. If two words mean the same thing, we pick one and retire the other.
- Applies everywhere customer- or team-facing: docs, app UI, website, decks, support replies, and code-facing copy (comments, log lines).

## Core terms

| Concept | Preferred | Don't use | Definition |
|---|---|---|---|
| The software you run on your infra (and the environment it gives you) | **workspace** | client (as a noun), agent, node, box, instance, deployment | tracebloc's software running on your own infrastructure — your private, dedicated AI environment, where you invite contributors to train models on your data. |
| The credential that connects it to the platform | **Client ID** | client key, token, API key | Created on the clients page; identifies your workspace to the platform. ("client" survives **only** here, matching the current UI.) |
| The hosted tracebloc service | **the platform** | the cloud, the server, the backend, SaaS | The hosted side (ai.tracebloc.io) that contributors connect through. |
| The user's own servers / laptop | **your infrastructure** | your box, your environment, on-prem (as a noun) | The hardware the workspace runs on — owned and controlled by the user. |
| The person who deploys & owns it | **workspace owner** | admin, host, customer, "the user" | The person who deploys the workspace, ingests data, creates use cases, and controls what's shared. |
| The invited model builder | **contributor** | vendor, participant, expert, "the user" | An invited, whitelisted data scientist who submits and trains models — and never sees the raw data. |
| The collaborative task | **use case** | project, challenge, competition | A task defined from your datasets that contributors build models for. |
| The user's data | **dataset** | data source, "data set" (two words) | Training and test data ingested and staged locally. |
| Bringing data in | **ingest** | upload, import, load | Staging a dataset locally for use cases. (Raw data never leaves your infrastructure.) |
| Connection status | **Online / Offline** | connected/disconnected, up/down | Whether the client has an active secure connection to the platform. |

## Words to retire

- **"box"** — casual and vague. Say **your infrastructure**.
- **"the cloud"** — implies the data leaves. Say **the platform** (hosted side) or **your infrastructure** (their side).
- **"upload your data"** — implies the data leaves. Say **ingest** / **stage**.
- **"vendor"** for model builders — say **contributor**.
- **"Tracebloc"** capitalized mid-sentence — the brand is lowercase **tracebloc** (except at the start of a sentence or in titles).

## Open questions (decide before enforcing)

1. ✅ **client vs. workspace — DECIDED 2026-06-05 (Lukas): use _workspace_.** Option (b): *workspace* everywhere user-facing; *client* survives only in **Client ID** (the credential). The Environment Setup docs were swept to match. **Residual gap:** the app UI still shows "client" / "clients page" — a UI rename is the follow-up so docs and product fully agree.
2. **workspace owner vs. admin vs. data owner** — what do we call the deploying person, especially in the UI ("admin panel")?
3. **contributor vs. data scientist** — is *contributor* the term in both product and marketing?
59 changes: 47 additions & 12 deletions create-use-case/prepare-dataset.mdx
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
---
title: "Prepare Data"
description: "Learn how to prepare and ingest your datasets into tracebloc using containerized data ingestors. Complete guide for CSV, image, and text data with Kubernetes deployment steps."

Check warning on line 3 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L3

Did you really mean 'tracebloc'?

Check warning on line 3 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L3

Did you really mean 'ingestors'?
---

## Overview

Make your data available to the Kubernetes cluster so it can be used for training and evaluation. Regardless of where your client runs on Azure, AWS, Google Cloud, or a local Minikube setup, the process of ingesting datasets works the same way.
Make your data available to the Kubernetes cluster so it can be used for training and evaluation. Whether your client runs on Azure, AWS, Google Cloud, or a local Minikube setup, the process of ingesting datasets works the same way.

Check warning on line 8 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L8

Did you really mean 'Minikube'?

The data ingestor is a lightweight service that bridges your raw data and the cluster's persistent storage. It comes with ready-made templates (CSV, images, text) that you can use as starting points and customize for your own dataset. By containerizing the ingestion step, the ingestor validates data format and schema, enforces consistency, and transfers the dataset securely into cluster's SQL storage where it becomes accessible to all training and evaluation jobs.

Check warning on line 10 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L10

Did you really mean 'ingestor'?

Check warning on line 10 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L10

Did you really mean 'ingestor'?

This guide covers:
- Customizing ingestor templates for different data types (CSV, images, text)

Check warning on line 13 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L13

Did you really mean 'ingestor'?
- Deploying the data ingestor for training and test data using Kubernetes

Check warning on line 14 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L14

Did you really mean 'ingestor'?
- Managing datasets through the tracebloc interface

Check warning on line 15 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L15

Did you really mean 'tracebloc'?

**IMPORTANT** Make sure that the data format and ML task is supported and that data standards are met by reviewing the [docs](/create-use-case/prerequisites). You must run the process twice, once to ingest training and once to ingest testing data.

Expand All @@ -20,15 +20,25 @@

You can ingest data into your client in two ways:

- **Declarative YAML (recommended, simpler)** — describe your dataset in ~8 lines of `ingest.yaml`, then `helm install`. No Dockerfile, no custom Python script. The official ingestor image runs it for you. Use this for any dataset that fits a supported category.

Check warning on line 23 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L23

Did you really mean 'Dockerfile'?

Check warning on line 23 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L23

Did you really mean 'ingestor'?
- **Custom Python template + Kubernetes Job (advanced)** — clone the [data-ingestors repo](https://github.com/tracebloc/data-ingestors), pick a per-category template script, edit it, build and push a Docker image, then `kubectl apply` an `ingestor-job.yaml`. Use this when the declarative schema can't express what your data needs — e.g. non-trivial preprocessing, a custom validator, or a `BaseProcessor` subclass.

Check warning on line 24 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L24

Did you really mean 'validator'?

Start with the declarative method below. Drop down to the custom-template flow only if you need it.

## Declarative YAML (recommended)

Describe your dataset in ~8 lines of YAML, then `helm install`. The official ingestor image (published as `ghcr.io/tracebloc/ingestor`) runs it. No Dockerfile, no Python script.

Check warning on line 30 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L30

Did you really mean 'ingestor'?

Check warning on line 30 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L30

Did you really mean 'Dockerfile'?

<Note>
**Before you run any commands in this section:** if you installed the client via the one-liner (`bash <(curl -fsSL https://tracebloc.io/i.sh)`), every later `helm upgrade <workspace> tracebloc/client …` **must** include `--reset-then-reuse-values`, otherwise the upgrade drops the values the installer applied and breaks the workspace:

```bash
helm upgrade <workspace> tracebloc/client -n <namespace> --reset-then-reuse-values
```

Append `--version <version-number>` to pin a specific chart version. This caveat only affects upgrades of the parent `tracebloc/client` chart, not the `helm install tracebloc/ingestor` runs below.
</Note>

### 1. Add the chart repo (one-time)

```bash
Expand All @@ -36,25 +46,31 @@
helm repo update
```

The `tracebloc/client` parent chart bootstraps the cluster (jobs-manager, MySQL, RBAC). The `tracebloc/ingestor` subchart submits per-dataset ingestion runs against it.

Check warning on line 49 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L49

Did you really mean 'subchart'?

<Note>
If you installed the client via the one-liner (`bash <(curl -fsSL https://tracebloc.io/i.sh)`), use `--reset-then-reuse-values` so the helm upgrade doesn't drop the values the installer applied:
### 2. Stage your data on the cluster's shared PVC

The chart **doesn't transport data into the cluster** — it points at data already accessible to the cluster's shared PVC (`client-pvc` by default, mounted at `/data/shared/` inside the ingestor Pod). Before installing, get your raw files there.

Check warning on line 53 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L53

Did you really mean 'ingestor'?

For a single-node workspace (the default install), the PVC is backed by a host directory the installer created at `~/.tracebloc/<workspace>/data/`. Drop your files into a per-dataset subdirectory:

```bash
helm upgrade <workspace> tracebloc/client -n <namespace> --reset-then-reuse-values
# Host path on the machine where the tracebloc client is installed.
# Pick a <prefix> per dataset — it becomes the path you reference in ingest.yaml.
mkdir -p ~/.tracebloc/<workspace>/data/<prefix>
cp -R LOCAL_PATH/images ~/.tracebloc/<workspace>/data/<prefix>/
cp LOCAL_PATH/labels.csv ~/.tracebloc/<workspace>/data/<prefix>/
```

Append `--version <version-number>` to pin a specific chart version.
</Note>

### 2. Stage your data on the cluster's shared PVC
Inside the ingestor Pod those files appear at `/data/shared/<prefix>/...` — that's what you'll put in `ingest.yaml` below.

Check warning on line 65 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L65

Did you really mean 'ingestor'?

The chart **doesn't transport data into the cluster** — it points at data already accessible to the cluster's shared PVC (`client-pvc` by default, mounted at `/data/shared/` inside the ingestor Pod). Before installing, get your raw files there. The simplest pattern for a small dataset is a throwaway `kubectl cp` Pod that mounts the PVC; for production you'd typically use an init container with cloud-storage sync. Full staging recipe and manifests live in the [client ingestor README](https://github.com/tracebloc/client/blob/develop/ingestor/README.md#stage-your-data-on-the-shared-pvc).
<Note>
For multi-node or EKS deployments where the PVC isn't backed by a local host path, use a throwaway `kubectl cp` Pod or a cloud-storage init container instead. See the [client ingestor README](https://github.com/tracebloc/client/blob/develop/ingestor/README.md#stage-your-data-on-the-shared-pvc) for those recipes.

Check warning on line 68 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L68

Did you really mean 'ingestor'?
</Note>

### 3. Write your `ingest.yaml`

The example below is for `image_classification`. **Other categories require different fields** — e.g. `tabular_classification` has no `images:` and instead needs a typed `schema:` block. Don't copy this one blindly; grab the matching file from [`examples/yaml/`](https://github.com/tracebloc/data-ingestors/tree/master/examples/yaml) (one per category) and edit from there. Per-category sample data and READMEs live under [`templates/`](https://github.com/tracebloc/data-ingestors/tree/master/templates).

Check warning on line 73 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L73

Did you really mean 'READMEs'?

```yaml
apiVersion: tracebloc.io/v1
Expand All @@ -67,34 +83,53 @@
label: label
```

The top-level shape (`apiVersion`, `kind`, `category`, `table`, `intent`, `label`) is the same for every category; the `category` field picks the validator set, file-extension defaults, and column conventions. The data-source fields (`csv:`, `images:`, `schema:`, …) vary per category. The paths are *paths inside the ingestor Pod*, which is the PVC mount you populated in step 2.

Check warning on line 86 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L86

Did you really mean 'validator'?

### 4. Install once per dataset

The ingestor runs once: validates your data, copies files into the destination directory on the PVC, inserts rows into MySQL, sends metadata to the tracebloc backend, then exits. **Run it twice per dataset** — once with `intent: train`, once with `intent: test` — using distinct `table:` names. The example below shows both releases:

Check warning on line 90 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L90

Did you really mean 'ingestor'?

Check warning on line 90 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L90

Did you really mean 'tracebloc'?

```bash
helm install my-cats-dogs tracebloc/ingestor \
# Train release — points at the ingest.yaml from step 3 (table: cats_dogs_train, intent: train)
helm install cats-dogs-train tracebloc/ingestor \
--namespace <workspace> \
--set-file ingestConfig=./ingest.yaml
--set-file ingestConfig=./ingest-train.yaml

# Test release — same shape, with table: cats_dogs_test and intent: test
helm install cats-dogs-test tracebloc/ingestor \
--namespace <workspace> \
--set-file ingestConfig=./ingest-test.yaml
```

Each `helm install` is a separate release (the first argument is the release name), so the two runs don't collide. The ingestor Pod picks up `CLIENT_ID` / `CLIENT_PASSWORD` automatically from the Kubernetes Secret the parent `tracebloc/client` chart created in `<workspace>` at install time — you don't pass credentials on the `helm install` command.

Check warning on line 104 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L104

Did you really mean 'ingestor'?

<Warning>
**Validation error like `'<your_category>' is not one of [...]` or `Additional properties are not allowed (<field> was unexpected)`?** This comes from the cluster's `jobs-manager` validating against its own bundled schema at submit time — the deployed schema is older than the ingestor image you're installing. `helm repo update` won't fix it (that only refreshes the local chart index, not the running server). The fix is on the cluster side: upgrade the parent chart so jobs-manager redeploys with the current schema.

Check warning on line 107 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L107

Did you really mean 'ingestor'?

```bash
helm upgrade <workspace> tracebloc/client \
-n <workspace> --reset-then-reuse-values
```

The ingestor runs once: validates your data, copies files into the destination directory on the PVC, inserts rows into MySQL, sends metadata to the tracebloc backend, then exits. Repeat per dataset (one helm release per dataset, with different `table:` and `intent:` for train and test).
Then re-run the `helm install` command above.
</Warning>

Full chart docs (data-staging recipe, schema, every category, update model, verification, override knobs) → [client ingestor README](https://github.com/tracebloc/client/blob/develop/ingestor/README.md).

## Custom Python template (advanced)

Use this flow when the declarative schema can't express what your data needs — typically when you have non-trivial preprocessing logic, a custom validator, or a `BaseProcessor` subclass. The sections below — Quick Setup and Detailed Setup — both describe this advanced path.

Check warning on line 121 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L121

Did you really mean 'validator'?

## Quick Setup

Use this quick setup if you already have an ingestor configured and just want to switch datasets or toggle between training and testing. If you are setting up for the first time, go to the next section for the detailed walkthrough.

Check warning on line 125 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L125

Did you really mean 'ingestor'?

Check warning on line 125 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L125

Did you really mean 'walkthrough'?

### Steps

1. Pick a template script and edit it. E.g. `/templates/tabular_classification/tabular_classification.py`
- Update csv options and data_path

Check warning on line 130 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L130

Did you really mean 'csv'?

Check warning on line 130 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L130

Did you really mean 'data_path'?
- Only for tabular data: Update schema
- Set `schema` and `CSVIngestor()`parameters like category, intent, label_column, etc. to match data type, task and train/test purpose

Check warning on line 132 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L132

Did you really mean 'label_column'?

```python
ingestor = CSVIngestor(
Expand Down Expand Up @@ -128,9 +163,9 @@

### 1. Configure a Template

This section walks you through the step-by-step setup of a data ingestor. You will clone the repository, select the right template for your data type, and customize it to match your task. Follow this guide if you are setting up an ingestor for the first time or need full control beyond the quick setup.

Check warning on line 166 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L166

Did you really mean 'ingestor'?

Check warning on line 166 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L166

Did you really mean 'ingestor'?

### Clone the Data Ingestor Repository

Check warning on line 168 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L168

Did you really mean 'Ingestor'?

Clone the public [Data Ingestor GitHub repository](https://github.com/tracebloc/data-ingestors):

Expand Down Expand Up @@ -195,14 +230,14 @@
...
```

Both Database, APIClient and other values are configured automatically from the environment variables defined in `ingestor_job.yaml`.

Check warning on line 233 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L233

Did you really mean 'APIClient'?

- `config.LABEL_FILE`: Path to local csv label file

Check warning on line 235 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L235

Did you really mean 'csv'?
- `config.BATCH_SIZE`: Batch size used during ingestion

### Customize a Template

Templates provide a starting point, but every dataset has its own format and labels. In this step you adapt the template to your data by tuning CSV ingestion options and setting the ingestor parameters (category, label column, intent, data path and schema). The following example in `templates/tabular_classification/tabular_classification.py` shows how to ingest a tabular dataset, but the setup works the same way for image or text data.

Check warning on line 240 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L240

Did you really mean 'ingestor'?

#### Needed for Tabular Data: Define Schema

Expand Down Expand Up @@ -255,7 +290,7 @@
```

#### Set CSV ingestion options
Customize parsing, memory handling, and data cleaning with the csv_options dictionary:

Check warning on line 293 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L293

Did you really mean 'csv_options'?

```python
csv_options = {
Expand All @@ -270,9 +305,9 @@
}
```

#### Set Up the Ingestor

Check warning on line 308 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L308

Did you really mean 'Ingestor'?

Define the Ingestor instance with the required configuration. See the tabular data example below:

Check warning on line 310 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L310

Did you really mean 'Ingestor'?

```python
ingestor = CSVIngestor(
Expand Down Expand Up @@ -304,7 +339,7 @@

### Docker Hub Setup (first-time users)

The cluster pulls your ingestor image from a public Docker registry, so you need an account before you can push. If you already have one, skip to [Edit Dockerfile](#edit-dockerfile).

Check warning on line 342 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L342

Did you really mean 'ingestor'?

1. **Create a Docker Hub account** at [hub.docker.com/signup](https://hub.docker.com/signup) and verify your email.
2. **Log in from your terminal** so the `docker push` command can authenticate:
Expand All @@ -313,18 +348,18 @@
docker login
```

3. **Push the data ingestor image** to your account using the build/push commands in the next section. The image name takes the form `<your-docker-username>/<image-name>:<tag>` — the username segment must match the account you just created.

Check warning on line 351 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L351

Did you really mean 'ingestor'?
4. **Make the image public** so the cluster can pull it without credentials:
- Go to [hub.docker.com/repositories](https://hub.docker.com/repositories), open the repository you just pushed.
- Click **Settings → Visibility settings → Make public**.

Keeping the image private is also fine, but then you must create a Kubernetes `imagePullSecret` named `regcred` in the client namespace (the `ingestor-job.yaml` already references it).

Check warning on line 356 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L356

Did you really mean 'namespace'?

### Place data files on the client host

Datasets are **not** baked into the Docker image. They live on the client host in the per-workspace data directory and are mounted into the ingestor pod through the shared PVC (`client-pvc` → `/data/shared`).

Check warning on line 360 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L360

Did you really mean 'ingestor'?

Copy your dataset into the client's data directory, where `<workspace>` is the workspace name you chose during client install (which is also the Helm release name and the Kubernetes namespace — the chart uses the same value for all three). The directory `~/.tracebloc/<workspace>/data/` is created automatically by the installer; just drop your files into it:

Check warning on line 362 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L362

Did you really mean 'namespace'?

```bash
# Host path on the machine where the tracebloc client is installed.
Expand All @@ -333,20 +368,20 @@
cp LOCAL_PATH/labels.csv ~/.tracebloc/<workspace>/data/
```

Inside the ingestor pod this directory is mounted at `/data/shared`, so the same files appear as `/data/shared/images/...` and `/data/shared/labels.csv`. Set `SRC_PATH` and `LABEL_FILE` in `ingestor-job.yaml` to point at those in-pod paths (see [Configure Kubernetes](#3-configure-kubernetes) below).

Check warning on line 371 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L371

Did you really mean 'ingestor'?

For tabular data the same rule applies — drop the single `labels.csv` (with features and labels) into `~/.tracebloc/<workspace>/data/`.

### Edit Dockerfile

Check warning on line 375 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L375

Did you really mean 'Dockerfile'?

The Dockerfile only needs to package the ingestion script — the dataset is mounted at runtime, so do **not** `COPY` data into the image:

Check warning on line 377 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L377

Did you really mean 'Dockerfile'?

```dockerfile
# Copy the ingestion script into /app
COPY templates/tabular_classification/tabular_classification.py /app/ingestor.py
```

If the cluster enforces the `restricted` Pod Security Standard (see [Run as non-root](#run-as-non-root) below), also add a non-root user to the Dockerfile, **before** the `# Set the entrypoint` line:

Check warning on line 384 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L384

Did you really mean 'Dockerfile'?

```dockerfile
RUN groupadd -g 1000 app && \
Expand Down Expand Up @@ -445,14 +480,14 @@
- `image`, your Docker image (imagePullPolicy: Always for DockerHub, IfNotPresent for local)
- `CLIENT_ID`, `CLIENT_PASSWORD` from the [tracebloc client view](https://ai.tracebloc.io/clients)
- `TABLE_NAME`, unique per dataset, train and test use different names, no spaces. Different names for train and test data is mandatory
- `LABEL_FILE`, path inside the ingestor pod (under `/data/shared`) to the CSV with file paths and labels — must match the location of the file you placed in `~/.tracebloc/<workspace>/data/`

Check warning on line 483 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L483

Did you really mean 'ingestor'?
- `SRC_PATH`, root inside the pod where the dataset directory is mounted (`/data/shared`)
- `BATCH_SIZE` is the number of entries sent to the server per request. Optional — defaults to 4000. Keep it consistent across data types. It depends on available CPU memory, not for example image size. Too large can exhaust memory. It was tested up to 10,000, but 5,000 is a safe default for most systems.
- `LOG_LEVEL`, "WARNING" for all warnings and errors, "INFO" for all logs, "ERROR" for errors only

### 4. Deploy

Run the ingestor as a Kubernetes Job:

Check warning on line 490 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L490

Did you really mean 'ingestor'?

```bash
kubectl apply -f ingestor-job.yaml -n <workspace>
Expand All @@ -468,7 +503,7 @@

### Run as non-root

If the namespace enforces the `restricted` [Pod Security Standard](https://kubernetes.io/docs/concepts/security/pod-security-standards/), `kubectl apply` will be admitted but the pod will be rejected with a warning like:

Check warning on line 506 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L506

Did you really mean 'namespace'?

```text
Warning: would violate PodSecurity "restricted:latest":
Expand All @@ -494,7 +529,7 @@
type: RuntimeDefault
```

**2. Run the container as a non-root user.** Add the following to the Dockerfile **before** the `# Set the entrypoint` line so the image ships with a UID that satisfies `runAsNonRoot: true`:

Check warning on line 532 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L532

Did you really mean 'Dockerfile'?

```dockerfile
RUN groupadd -g 1000 app && \
Expand All @@ -506,7 +541,7 @@

Rebuild and push the image, then re-apply the job.

The data ingestor always runs a validation step before ingestion and moving files.

Check warning on line 544 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L544

Did you really mean 'ingestor'?


#### Verify Deployment
Expand All @@ -528,7 +563,7 @@
**Interface displays:**
- Dataset name, ID, and record count
- Data type (Tabular, Image, Text) and purpose (Training/Testing)
- Namespace and GPU requirements

Check warning on line 566 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L566

Did you really mean 'Namespace'?

## Best Practices
- Deploy jobs for training and testing simultaneously using different job names
Expand Down
1 change: 1 addition & 0 deletions create-use-case/templates.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
description: "Ready-made data ingestion templates for every supported task — clone, configure, deploy."
---

Each task tracebloc supports comes with a runnable data-ingestion template — a working `Dockerfile`, `ingestor.py`, and `ingestor-job.yaml` you can copy, point at your data, and ship.

Check warning on line 6 in create-use-case/templates.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/templates.mdx#L6

Did you really mean 'tracebloc'?

## Available templates

Expand All @@ -11,9 +11,10 @@
|---|---|
| Image classification | [`templates/image_classification`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/image_classification) |
| Object detection | [`templates/object_detection`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/object_detection) |
| Keypoint detection | [`templates/keypoint_detection`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/keypoint_detection) |

Check warning on line 14 in create-use-case/templates.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

create-use-case/templates.mdx#L14

Did you really mean 'Keypoint'?
| Semantic segmentation | [`templates/semantic_segmentation`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/semantic_segmentation) |
| Text classification | [`templates/text_classification`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/text_classification) |
| Masked language modeling | [`templates/masked_language_modeling`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/masked_language_modeling) |
| Tabular classification | [`templates/tabular_classification`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/tabular_classification) |
| Tabular regression | [`templates/tabular_regression`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/tabular_regression) |
| Time series forecasting | [`templates/time_series_forecasting`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/time_series_forecasting) |
Expand Down
17 changes: 15 additions & 2 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,22 @@
{
"group": "Environment Setup",
"pages": [
"environment-setup/setup-guide",
"environment-setup/overview",
"environment-setup/quickstart",
{
"group": "Deployment environments",
"pages": [
"environment-setup/deployment-environments",
"environment-setup/deploy-local",
"environment-setup/deploy-bare-metal",
"environment-setup/eks-client-deployment-guide",
"environment-setup/deploy-aks",
"environment-setup/deploy-openshift"
]
},
"environment-setup/configuration",
"environment-setup/eks-client-deployment-guide",
"environment-setup/operations",
"environment-setup/security",
"environment-setup/troubleshooting"
]
},
Expand Down
16 changes: 8 additions & 8 deletions environment-setup/configuration.mdx
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
---
title: "Configuration"
description: "Customize your tracebloc workspace — environment variables, cluster management, GPU support, and manual Helm deployment."

Check warning on line 3 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L3

Did you really mean 'tracebloc'?
---

The installer uses sensible defaults. This page covers everything you can change — from cluster naming and port mapping to GPU configuration, manual Helm deployment, and day-to-day cluster management.
The installer uses sensible defaults; this page covers what you can change.

**Installed with the one-liner?** See [Installer Options](#installer-options), [Cluster Management](#cluster-management), and [GPU Support](#gpu-support). **Deploying into your own cluster with Helm** (EKS, AKS, bare-metal)? Jump to [Manual Deployment](#manual-deployment).

## Installer Options

Override defaults by setting environment variables before the install command. Useful when you need a custom cluster name, multiple worker nodes, or non-standard ports.
Override defaults by setting environment variables before the install command. Useful for a custom cluster name, extra worker nodes, or a different data directory.

| Variable | Default | Description |
|----------|---------|-------------|
| `CLUSTER_NAME` | `tracebloc` | Name of the k3d cluster |
| `SERVERS` | `1` | Number of control-plane nodes |
| `AGENTS` | `1` | Number of worker nodes |
| `K8S_VERSION` | `v1.29.4-k3s1` | k3s image tag |
| `HTTP_PORT` | `80` | Host port mapped to cluster HTTP ingress |
| `HTTPS_PORT` | `443` | Host port mapped to cluster HTTPS ingress |
| `HOST_DATA_DIR` | `~/.tracebloc` | Persistent data directory on host |

Example — custom cluster name with two worker nodes:
Expand All @@ -42,10 +42,10 @@

### View logs

The jobs manager is the main tracebloc process. Check its logs when debugging connectivity or job execution issues:

Check warning on line 45 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L45

Did you really mean 'tracebloc'?

```bash
kubectl logs -n <workspace> -l app=tracebloc-jobs-manager
kubectl logs -n <workspace> -l app=manager
```

### Useful commands
Expand All @@ -64,13 +64,13 @@

## GPU Support

The installer auto-detects GPU hardware and configures the cluster accordingly. No manual setup required on Linux — the installer handles drivers, container toolkit, and Kubernetes device plugin.
GPU is automatic on Linux — the installer detects your hardware and sets up drivers, the container toolkit, and the Kubernetes device plugin.

### NVIDIA (Linux)

Fully automatic. The installer:

1. Detects NVIDIA GPUs via `nvidia-smi` or `lspci`

Check warning on line 73 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L73

Did you really mean 'GPUs'?
2. Installs drivers if missing (Ubuntu, RHEL/CentOS, Arch)
3. Installs the NVIDIA Container Toolkit and configures Docker
4. Deploys the NVIDIA k8s device plugin into the cluster
Expand All @@ -80,11 +80,11 @@

### AMD (Linux)

Auto-detected. ROCm is installed automatically on Ubuntu and RHEL/CentOS. A logout/login may be needed for full GPU access.

Check warning on line 83 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L83

Did you really mean 'ROCm'?

### macOS

CPU only. Docker Desktop on macOS does not support GPU passthrough. For GPU workloads, deploy on a Linux machine with NVIDIA GPUs or use [AWS (EKS)](/environment-setup/eks-client-deployment-guide).

Check warning on line 87 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L87

Did you really mean 'passthrough'?

Check warning on line 87 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L87

Did you really mean 'GPUs'?

### Windows

Expand All @@ -94,7 +94,7 @@

Skip the installer entirely. Use this if you already have a Kubernetes cluster, need custom resource limits, or want full control over the Helm deployment.

A single unified chart — **`tracebloc/client`** — supports AKS, EKS, bare-metal, and OpenShift. Platform behaviour is selected via values overrides; reference defaults live in the repo at [`client/ci/{aks,eks,bm,oc}-values.yaml`](https://github.com/tracebloc/client/tree/main/client/ci).
A single chart — **`tracebloc/client`** — supports AKS, EKS, bare-metal, and OpenShift; choose your platform via values overrides. Reference defaults live at [`client/ci/{aks,eks,bm,oc}-values.yaml`](https://github.com/tracebloc/client/tree/main/client/ci).

### Add the Helm repository

Expand Down Expand Up @@ -242,7 +242,7 @@

#### Docker Registry

The chart pulls the client image from a container registry — credentials are required in production. Use a token, not a plaintext password.

Check warning on line 245 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L245

Did you really mean 'plaintext'?

```yaml
dockerRegistry:
Expand All @@ -266,7 +266,7 @@

#### Auto-upgrade (on by default)

Releases of chart `1.3.0+` install a `<release>-auto-upgrade` CronJob that polls `https://tracebloc.github.io/client` daily and runs `helm upgrade --reset-then-reuse-values` whenever a newer chart version is published. Closes [tracebloc/client#69](https://github.com/tracebloc/client/issues/69) — older deployed clients no longer drift from the latest secure release.
Releases of chart `1.3.0+` install a `<release>-auto-upgrade` CronJob that polls `https://tracebloc.github.io/client` daily and runs `helm upgrade --reset-then-reuse-values` whenever a newer chart version is published — so clients auto-update instead of staying pinned to the version they were installed with.

```yaml
autoUpgrade:
Expand All @@ -279,11 +279,11 @@
timeout: "10m"
```

The CronJob's ServiceAccount is bound to the built-in `cluster-admin` ClusterRole because the chart templates cluster-scoped resources (PriorityClass, StorageClass, ClusterRoleBinding, optionally Namespace). Disable if you need a manual approval gate on upgrades.

Check warning on line 282 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L282

Did you really mean 'Namespace'?

#### NetworkPolicy hardening for training pods

Training pods run untrusted ML code. The chart can apply a NetworkPolicy that denies ingress and restricts egress to DNS + external HTTPS only — blocking pod-to-pod, MySQL, and Kubernetes API access from the training pod.

Check warning on line 286 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L286

Did you really mean 'untrusted'?

```yaml
networkPolicy:
Expand All @@ -309,12 +309,12 @@
Leave `enabled: false` on clusters without an enforcing CNI — silently having no protection is worse than explicitly disabling it.

<Warning>
The chart's training-pod egress lockdown only blocks traffic if your CNI enforces NetworkPolicy. Verify your CNI before relying on it.

Check warning on line 312 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L312

Did you really mean 'lockdown'?
</Warning>

#### Resource Monitor and node-agents namespace

Check warning on line 315 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L315

Did you really mean 'namespace'?

The `tracebloc-resource-monitor` DaemonSet collects node-level CPU/memory metrics. It mounts `hostPath` volumes (`/proc`, `/sys`) which Pod Security Admission's `restricted` profile bans — so the chart isolates it in a dedicated **privileged** namespace (default `tracebloc-node-agents`).

Check warning on line 317 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L317

Did you really mean 'namespace'?

```yaml
resourceMonitor: true # set false on clusters where metrics-server cannot be installed
Expand All @@ -324,7 +324,7 @@
name: tracebloc-node-agents
```

When `create: false`, create the namespace yourself with the required PSA labels:

Check warning on line 327 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L327

Did you really mean 'namespace'?

```bash
kubectl create namespace tracebloc-node-agents
Expand All @@ -338,7 +338,7 @@

#### Pod Security Admission labels

Training Jobs run untrusted user-supplied ML code. The chart can create the release namespace with Pod Security Admission `warn`/`audit`/`enforce` labels at the `restricted` profile for defense-in-depth:

Check warning on line 341 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L341

Did you really mean 'untrusted'?

Check warning on line 341 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L341

Did you really mean 'namespace'?

```yaml
namespace:
Expand All @@ -349,7 +349,7 @@
enforce: restricted # set "" for bare-metal hostPath installs
```

When `create: false` (default) and you want PSA labels on an existing namespace:

Check warning on line 352 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L352

Did you really mean 'namespace'?

```bash
kubectl label namespace <workspace> \
Expand All @@ -374,7 +374,7 @@

#### PriorityClass and PodDisruptionBudgets

The chart pins the MySQL pod with a `tracebloc-data-plane` PriorityClass (value `1000000`) so it survives node-level OOM and scheduling pressure, and applies PDBs to MySQL and the jobs manager. Override only if you run a multi-replica MySQL externally:

Check warning on line 377 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L377

Did you really mean 'PDBs'?

```yaml
priorityClass:
Expand All @@ -389,7 +389,7 @@

### Deploy

Install the chart into a new namespace:

Check warning on line 392 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L392

Did you really mean 'namespace'?

```bash
helm upgrade --install <workspace> tracebloc/client \
Expand Down Expand Up @@ -420,7 +420,7 @@
helm uninstall <workspace> -n <workspace>
```

PVCs and the PriorityClass are annotated `helm.sh/resource-policy: keep` so your data and shared cluster resources survive uninstall. To remove them too:

Check warning on line 423 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L423

Did you really mean 'PVCs'?

```bash
kubectl delete pvc --all -n <workspace>
Expand All @@ -440,10 +440,10 @@

## Security

Tracebloc is designed so your data never has to leave your network. Here's how:

Check warning on line 443 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L443

Did you really mean 'Tracebloc'?

- **Data stays local.** Training data never leaves your infrastructure. Only metadata and metrics are shared with the platform.
- **Encrypted.** All communication between client and platform is TLS-encrypted.
- **Isolated.** Training runs in containers with restricted system access. Kubernetes namespaces separate workloads from each other.

Check warning on line 447 in environment-setup/configuration.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/configuration.mdx#L447

Did you really mean 'namespaces'?
- **Scanned.** Submitted models are analyzed for vulnerabilities before execution on your infrastructure.
- **Minimal footprint.** The installer only modifies `~/.tracebloc/` and Docker. No system-wide changes.
54 changes: 54 additions & 0 deletions environment-setup/deploy-aks.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: "Azure AKS"
description: "Deploy a tracebloc workspace on Azure Kubernetes Service."

Check warning on line 3 in environment-setup/deploy-aks.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/deploy-aks.mdx#L3

Did you really mean 'tracebloc'?
---

**When to pick it** — You're on Azure and want managed Kubernetes with autoscaling and GPU node pools.

Check warning on line 6 in environment-setup/deploy-aks.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/deploy-aks.mdx#L6

Did you really mean 'autoscaling'?

## Prerequisites

- An AKS cluster, and `kubectl` pointed at it (`az aks get-credentials …`).
- **Helm 3.x**.
- Your **Client ID** and password from the [clients page](https://ai.tracebloc.io/clients).

## Install

```bash
helm repo add tracebloc https://tracebloc.github.io/client
helm repo update
helm show values tracebloc/client > values.yaml # edit per below
helm upgrade --install tracebloc tracebloc/client \
-n tracebloc --create-namespace -f values.yaml
```

Set your credentials in `values.yaml` (see [Configuration → Authentication](/environment-setup/configuration#authentication)).

## Verify

```bash
kubectl get pods -n tracebloc
```

Pods `Running`, your workspace **Online** on the clients page.

## Environment-specific config

Use Azure Files for shared storage:

```yaml
storageClass:
create: true
provisioner: file.csi.azure.com
parameters:
skuName: Standard_LRS
mountOptions: [dir_mode=0750, file_mode=0640, uid=999, gid=999, mfsymlinks, cache=strict, actimeo=30]
clusterScope: true
```

- **NetworkPolicy:** create the AKS cluster with `--network-policy azure` (Azure NPM) or Calico — otherwise the training-pod egress lockdown won't enforce.

Check warning on line 48 in environment-setup/deploy-aks.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc) - vale-spellcheck

environment-setup/deploy-aks.mdx#L48

Did you really mean 'lockdown'?
- `metrics-server` is bundled on AKS.

## Production notes

- Use GPU node pools for training workloads; size requests/limits per job.
- Day-2 upgrades and rollbacks: see [Operations](/environment-setup/operations).
Loading
Loading