diff --git a/TERMINOLOGY.md b/TERMINOLOGY.md
new file mode 100644
index 0000000..1b8c81a
--- /dev/null
+++ b/TERMINOLOGY.md
@@ -0,0 +1,40 @@
+# tracebloc terminology
+
+The single source of truth for the words we use — in the product UI, docs, website, decks, and support. One concept, one word. The goal: every person and every page names things the same way.
+
+> **Status: seed.** Lukas owns this file. Add terms as they come up; we run it against the team so we stay consistent. The **open questions** at the bottom need a decision before we hard-enforce them.
+
+## How to use it
+
+- Use the **Preferred** term. Avoid everything in **Don't use**.
+- One concept = one word. If two words mean the same thing, we pick one and retire the other.
+- Applies everywhere customer- or team-facing: docs, app UI, website, decks, support replies, and code-facing copy (comments, log lines).
+
+## Core terms
+
+| Concept | Preferred | Don't use | Definition |
+|---|---|---|---|
+| The software you run on your infra (and the environment it gives you) | **workspace** | client (as a noun), agent, node, box, instance, deployment | tracebloc's software running on your own infrastructure — your private, dedicated AI environment, where you invite contributors to train models on your data. |
+| The credential that connects it to the platform | **Client ID** | client key, token, API key | Created on the clients page; identifies your workspace to the platform. ("client" survives **only** here, matching the current UI.) |
+| The hosted tracebloc service | **the platform** | the cloud, the server, the backend, SaaS | The hosted side (ai.tracebloc.io) that contributors connect through. |
+| The user's own servers / laptop | **your infrastructure** | your box, your environment, on-prem (as a noun) | The hardware the workspace runs on — owned and controlled by the user. |
+| The person who deploys & owns it | **workspace owner** | admin, host, customer, "the user" | The person who deploys the workspace, ingests data, creates use cases, and controls what's shared. |
+| The invited model builder | **contributor** | vendor, participant, expert, "the user" | An invited, whitelisted data scientist who submits and trains models — and never sees the raw data. |
+| The collaborative task | **use case** | project, challenge, competition | A task defined from your datasets that contributors build models for. |
+| The user's data | **dataset** | data source, "data set" (two words) | Training and test data ingested and staged locally. |
+| Bringing data in | **ingest** | upload, import, load | Staging a dataset locally for use cases. (Raw data never leaves your infrastructure.) |
+| Connection status | **Online / Offline** | connected/disconnected, up/down | Whether the client has an active secure connection to the platform. |
+
+## Words to retire
+
+- **"box"** — casual and vague. Say **your infrastructure**.
+- **"the cloud"** — implies the data leaves. Say **the platform** (hosted side) or **your infrastructure** (their side).
+- **"upload your data"** — implies the data leaves. Say **ingest** / **stage**.
+- **"vendor"** for model builders — say **contributor**.
+- **"Tracebloc"** capitalized mid-sentence — the brand is lowercase **tracebloc** (except at the start of a sentence or in titles).
+
+## Open questions (decide before enforcing)
+
+1. ✅ **client vs. workspace — DECIDED 2026-06-05 (Lukas): use _workspace_.** Option (b): *workspace* everywhere user-facing; *client* survives only in **Client ID** (the credential). The Environment Setup docs were swept to match. **Residual gap:** the app UI still shows "client" / "clients page" — a UI rename is the follow-up so docs and product fully agree.
+2. **workspace owner vs. admin vs. data owner** — what do we call the deploying person, especially in the UI ("admin panel")?
+3. **contributor vs. data scientist** — is *contributor* the term in both product and marketing?
diff --git a/create-use-case/prepare-dataset.mdx b/create-use-case/prepare-dataset.mdx
index 05d77a3..aee27d7 100644
--- a/create-use-case/prepare-dataset.mdx
+++ b/create-use-case/prepare-dataset.mdx
@@ -5,7 +5,7 @@ description: "Learn how to prepare and ingest your datasets into tracebloc using
## Overview
-Make your data available to the Kubernetes cluster so it can be used for training and evaluation. Regardless of where your client runs on Azure, AWS, Google Cloud, or a local Minikube setup, the process of ingesting datasets works the same way.
+Make your data available to the Kubernetes cluster so it can be used for training and evaluation. Whether your client runs on Azure, AWS, Google Cloud, or a local Minikube setup, the process of ingesting datasets works the same way.
The data ingestor is a lightweight service that bridges your raw data and the cluster's persistent storage. It comes with ready-made templates (CSV, images, text) that you can use as starting points and customize for your own dataset. By containerizing the ingestion step, the ingestor validates data format and schema, enforces consistency, and transfers the dataset securely into cluster's SQL storage where it becomes accessible to all training and evaluation jobs.
@@ -29,6 +29,16 @@ Start with the declarative method below. Drop down to the custom-template flow o
Describe your dataset in ~8 lines of YAML, then `helm install`. The official ingestor image (published as `ghcr.io/tracebloc/ingestor`) runs it. No Dockerfile, no Python script.
+
+**Before you run any commands in this section:** if you installed the client via the one-liner (`bash <(curl -fsSL https://tracebloc.io/i.sh)`), every later `helm upgrade tracebloc/client …` **must** include `--reset-then-reuse-values`, otherwise the upgrade drops the values the installer applied and breaks the workspace:
+
+```bash
+helm upgrade tracebloc/client -n --reset-then-reuse-values
+```
+
+Append `--version ` to pin a specific chart version. This caveat only affects upgrades of the parent `tracebloc/client` chart, not the `helm install tracebloc/ingestor` runs below.
+
+
### 1. Add the chart repo (one-time)
```bash
@@ -38,19 +48,25 @@ helm repo update
The `tracebloc/client` parent chart bootstraps the cluster (jobs-manager, MySQL, RBAC). The `tracebloc/ingestor` subchart submits per-dataset ingestion runs against it.
-
-If you installed the client via the one-liner (`bash <(curl -fsSL https://tracebloc.io/i.sh)`), use `--reset-then-reuse-values` so the helm upgrade doesn't drop the values the installer applied:
+### 2. Stage your data on the cluster's shared PVC
+
+The chart **doesn't transport data into the cluster** — it points at data already accessible to the cluster's shared PVC (`client-pvc` by default, mounted at `/data/shared/` inside the ingestor Pod). Before installing, get your raw files there.
+
+For a single-node workspace (the default install), the PVC is backed by a host directory the installer created at `~/.tracebloc//data/`. Drop your files into a per-dataset subdirectory:
```bash
-helm upgrade tracebloc/client -n --reset-then-reuse-values
+# Host path on the machine where the tracebloc client is installed.
+# Pick a per dataset — it becomes the path you reference in ingest.yaml.
+mkdir -p ~/.tracebloc//data/
+cp -R LOCAL_PATH/images ~/.tracebloc//data//
+cp LOCAL_PATH/labels.csv ~/.tracebloc//data//
```
-Append `--version ` to pin a specific chart version.
-
-
-### 2. Stage your data on the cluster's shared PVC
+Inside the ingestor Pod those files appear at `/data/shared//...` — that's what you'll put in `ingest.yaml` below.
-The chart **doesn't transport data into the cluster** — it points at data already accessible to the cluster's shared PVC (`client-pvc` by default, mounted at `/data/shared/` inside the ingestor Pod). Before installing, get your raw files there. The simplest pattern for a small dataset is a throwaway `kubectl cp` Pod that mounts the PVC; for production you'd typically use an init container with cloud-storage sync. Full staging recipe and manifests live in the [client ingestor README](https://github.com/tracebloc/client/blob/develop/ingestor/README.md#stage-your-data-on-the-shared-pvc).
+
+For multi-node or EKS deployments where the PVC isn't backed by a local host path, use a throwaway `kubectl cp` Pod or a cloud-storage init container instead. See the [client ingestor README](https://github.com/tracebloc/client/blob/develop/ingestor/README.md#stage-your-data-on-the-shared-pvc) for those recipes.
+
### 3. Write your `ingest.yaml`
@@ -71,13 +87,32 @@ The top-level shape (`apiVersion`, `kind`, `category`, `table`, `intent`, `label
### 4. Install once per dataset
+The ingestor runs once: validates your data, copies files into the destination directory on the PVC, inserts rows into MySQL, sends metadata to the tracebloc backend, then exits. **Run it twice per dataset** — once with `intent: train`, once with `intent: test` — using distinct `table:` names. The example below shows both releases:
+
```bash
-helm install my-cats-dogs tracebloc/ingestor \
+# Train release — points at the ingest.yaml from step 3 (table: cats_dogs_train, intent: train)
+helm install cats-dogs-train tracebloc/ingestor \
--namespace \
- --set-file ingestConfig=./ingest.yaml
+ --set-file ingestConfig=./ingest-train.yaml
+
+# Test release — same shape, with table: cats_dogs_test and intent: test
+helm install cats-dogs-test tracebloc/ingestor \
+ --namespace \
+ --set-file ingestConfig=./ingest-test.yaml
+```
+
+Each `helm install` is a separate release (the first argument is the release name), so the two runs don't collide. The ingestor Pod picks up `CLIENT_ID` / `CLIENT_PASSWORD` automatically from the Kubernetes Secret the parent `tracebloc/client` chart created in `` at install time — you don't pass credentials on the `helm install` command.
+
+
+**Validation error like `'' is not one of [...]` or `Additional properties are not allowed ( was unexpected)`?** This comes from the cluster's `jobs-manager` validating against its own bundled schema at submit time — the deployed schema is older than the ingestor image you're installing. `helm repo update` won't fix it (that only refreshes the local chart index, not the running server). The fix is on the cluster side: upgrade the parent chart so jobs-manager redeploys with the current schema.
+
+```bash
+helm upgrade tracebloc/client \
+ -n --reset-then-reuse-values
```
-The ingestor runs once: validates your data, copies files into the destination directory on the PVC, inserts rows into MySQL, sends metadata to the tracebloc backend, then exits. Repeat per dataset (one helm release per dataset, with different `table:` and `intent:` for train and test).
+Then re-run the `helm install` command above.
+
Full chart docs (data-staging recipe, schema, every category, update model, verification, override knobs) → [client ingestor README](https://github.com/tracebloc/client/blob/develop/ingestor/README.md).
diff --git a/create-use-case/templates.mdx b/create-use-case/templates.mdx
index 5b95571..72ac550 100644
--- a/create-use-case/templates.mdx
+++ b/create-use-case/templates.mdx
@@ -14,6 +14,7 @@ Each task tracebloc supports comes with a runnable data-ingestion template — a
| Keypoint detection | [`templates/keypoint_detection`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/keypoint_detection) |
| Semantic segmentation | [`templates/semantic_segmentation`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/semantic_segmentation) |
| Text classification | [`templates/text_classification`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/text_classification) |
+| Masked language modeling | [`templates/masked_language_modeling`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/masked_language_modeling) |
| Tabular classification | [`templates/tabular_classification`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/tabular_classification) |
| Tabular regression | [`templates/tabular_regression`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/tabular_regression) |
| Time series forecasting | [`templates/time_series_forecasting`](https://github.com/tracebloc/data-ingestors/tree/develop/templates/time_series_forecasting) |
diff --git a/docs.json b/docs.json
index 042d90b..c42a987 100644
--- a/docs.json
+++ b/docs.json
@@ -75,9 +75,22 @@
{
"group": "Environment Setup",
"pages": [
- "environment-setup/setup-guide",
+ "environment-setup/overview",
+ "environment-setup/quickstart",
+ {
+ "group": "Deployment environments",
+ "pages": [
+ "environment-setup/deployment-environments",
+ "environment-setup/deploy-local",
+ "environment-setup/deploy-bare-metal",
+ "environment-setup/eks-client-deployment-guide",
+ "environment-setup/deploy-aks",
+ "environment-setup/deploy-openshift"
+ ]
+ },
"environment-setup/configuration",
- "environment-setup/eks-client-deployment-guide",
+ "environment-setup/operations",
+ "environment-setup/security",
"environment-setup/troubleshooting"
]
},
diff --git a/environment-setup/configuration.mdx b/environment-setup/configuration.mdx
index d626845..6e2544b 100644
--- a/environment-setup/configuration.mdx
+++ b/environment-setup/configuration.mdx
@@ -3,11 +3,13 @@ title: "Configuration"
description: "Customize your tracebloc workspace — environment variables, cluster management, GPU support, and manual Helm deployment."
---
-The installer uses sensible defaults. This page covers everything you can change — from cluster naming and port mapping to GPU configuration, manual Helm deployment, and day-to-day cluster management.
+The installer uses sensible defaults; this page covers what you can change.
+
+**Installed with the one-liner?** See [Installer Options](#installer-options), [Cluster Management](#cluster-management), and [GPU Support](#gpu-support). **Deploying into your own cluster with Helm** (EKS, AKS, bare-metal)? Jump to [Manual Deployment](#manual-deployment).
## Installer Options
-Override defaults by setting environment variables before the install command. Useful when you need a custom cluster name, multiple worker nodes, or non-standard ports.
+Override defaults by setting environment variables before the install command. Useful for a custom cluster name, extra worker nodes, or a different data directory.
| Variable | Default | Description |
|----------|---------|-------------|
@@ -15,8 +17,6 @@ Override defaults by setting environment variables before the install command. U
| `SERVERS` | `1` | Number of control-plane nodes |
| `AGENTS` | `1` | Number of worker nodes |
| `K8S_VERSION` | `v1.29.4-k3s1` | k3s image tag |
-| `HTTP_PORT` | `80` | Host port mapped to cluster HTTP ingress |
-| `HTTPS_PORT` | `443` | Host port mapped to cluster HTTPS ingress |
| `HOST_DATA_DIR` | `~/.tracebloc` | Persistent data directory on host |
Example — custom cluster name with two worker nodes:
@@ -45,7 +45,7 @@ k3d cluster delete tracebloc
The jobs manager is the main tracebloc process. Check its logs when debugging connectivity or job execution issues:
```bash
-kubectl logs -n -l app=tracebloc-jobs-manager
+kubectl logs -n -l app=manager
```
### Useful commands
@@ -64,7 +64,7 @@ Install logs are saved to `~/.tracebloc/install-*.log`.
## GPU Support
-The installer auto-detects GPU hardware and configures the cluster accordingly. No manual setup required on Linux — the installer handles drivers, container toolkit, and Kubernetes device plugin.
+GPU is automatic on Linux — the installer detects your hardware and sets up drivers, the container toolkit, and the Kubernetes device plugin.
### NVIDIA (Linux)
@@ -94,7 +94,7 @@ The installer does **not** install GPU drivers on Windows. Pre-install NVIDIA dr
Skip the installer entirely. Use this if you already have a Kubernetes cluster, need custom resource limits, or want full control over the Helm deployment.
-A single unified chart — **`tracebloc/client`** — supports AKS, EKS, bare-metal, and OpenShift. Platform behaviour is selected via values overrides; reference defaults live in the repo at [`client/ci/{aks,eks,bm,oc}-values.yaml`](https://github.com/tracebloc/client/tree/main/client/ci).
+A single chart — **`tracebloc/client`** — supports AKS, EKS, bare-metal, and OpenShift; choose your platform via values overrides. Reference defaults live at [`client/ci/{aks,eks,bm,oc}-values.yaml`](https://github.com/tracebloc/client/tree/main/client/ci).
### Add the Helm repository
@@ -266,7 +266,7 @@ env:
#### Auto-upgrade (on by default)
-Releases of chart `1.3.0+` install a `-auto-upgrade` CronJob that polls `https://tracebloc.github.io/client` daily and runs `helm upgrade --reset-then-reuse-values` whenever a newer chart version is published. Closes [tracebloc/client#69](https://github.com/tracebloc/client/issues/69) — older deployed clients no longer drift from the latest secure release.
+Releases of chart `1.3.0+` install a `-auto-upgrade` CronJob that polls `https://tracebloc.github.io/client` daily and runs `helm upgrade --reset-then-reuse-values` whenever a newer chart version is published — so clients auto-update instead of staying pinned to the version they were installed with.
```yaml
autoUpgrade:
diff --git a/environment-setup/deploy-aks.mdx b/environment-setup/deploy-aks.mdx
new file mode 100644
index 0000000..d4b4041
--- /dev/null
+++ b/environment-setup/deploy-aks.mdx
@@ -0,0 +1,54 @@
+---
+title: "Azure AKS"
+description: "Deploy a tracebloc workspace on Azure Kubernetes Service."
+---
+
+**When to pick it** — You're on Azure and want managed Kubernetes with autoscaling and GPU node pools.
+
+## Prerequisites
+
+- An AKS cluster, and `kubectl` pointed at it (`az aks get-credentials …`).
+- **Helm 3.x**.
+- Your **Client ID** and password from the [clients page](https://ai.tracebloc.io/clients).
+
+## Install
+
+```bash
+helm repo add tracebloc https://tracebloc.github.io/client
+helm repo update
+helm show values tracebloc/client > values.yaml # edit per below
+helm upgrade --install tracebloc tracebloc/client \
+ -n tracebloc --create-namespace -f values.yaml
+```
+
+Set your credentials in `values.yaml` (see [Configuration → Authentication](/environment-setup/configuration#authentication)).
+
+## Verify
+
+```bash
+kubectl get pods -n tracebloc
+```
+
+Pods `Running`, your workspace **Online** on the clients page.
+
+## Environment-specific config
+
+Use Azure Files for shared storage:
+
+```yaml
+storageClass:
+ create: true
+ provisioner: file.csi.azure.com
+ parameters:
+ skuName: Standard_LRS
+ mountOptions: [dir_mode=0750, file_mode=0640, uid=999, gid=999, mfsymlinks, cache=strict, actimeo=30]
+clusterScope: true
+```
+
+- **NetworkPolicy:** create the AKS cluster with `--network-policy azure` (Azure NPM) or Calico — otherwise the training-pod egress lockdown won't enforce.
+- `metrics-server` is bundled on AKS.
+
+## Production notes
+
+- Use GPU node pools for training workloads; size requests/limits per job.
+- Day-2 upgrades and rollbacks: see [Operations](/environment-setup/operations).
diff --git a/environment-setup/deploy-bare-metal.mdx b/environment-setup/deploy-bare-metal.mdx
new file mode 100644
index 0000000..222d1b9
--- /dev/null
+++ b/environment-setup/deploy-bare-metal.mdx
@@ -0,0 +1,58 @@
+---
+title: "Bare-metal"
+description: "Deploy a tracebloc workspace on your own on-prem Kubernetes cluster."
+---
+
+**When to pick it** — You already run Kubernetes on-prem (k3s, kubeadm, RKE, …) and want a workspace on it, with full control over scheduling and storage.
+
+## Prerequisites
+
+- A running Kubernetes cluster and `kubectl` access.
+- **Helm 3.x**.
+- `metrics-server` installed (the resource monitor needs it).
+- Your **Client ID** and password from the [clients page](https://ai.tracebloc.io/clients).
+
+## Install
+
+```bash
+helm repo add tracebloc https://tracebloc.github.io/client
+helm repo update
+helm show values tracebloc/client > values.yaml # edit per below
+helm upgrade --install tracebloc tracebloc/client \
+ -n tracebloc --create-namespace -f values.yaml
+```
+
+Set your credentials in `values.yaml` (see [Configuration → Authentication](/environment-setup/configuration#authentication)).
+
+## Verify
+
+```bash
+kubectl get pods -n tracebloc
+```
+
+Pods `Running`, your workspace **Online** on the clients page.
+
+## Environment-specific config
+
+Use hostPath-backed volumes:
+
+```yaml
+hostPath:
+ enabled: true
+pvcAccessMode: ReadWriteOnce
+storageClass:
+ create: true
+ provisioner: kubernetes.io/no-provisioner
+namespace:
+ podSecurity:
+ enforce: "" # hostPath needs the privileged init-mysql-data container
+clusterScope: true
+```
+
+- **NetworkPolicy** (training-pod egress lockdown) only enforces if your CNI supports it — Calico, Cilium, or kube-router. Flannel alone does **not** enforce.
+
+## Production notes
+
+- Schedule MySQL and storage on reliable nodes; back up the data PVCs.
+- Size training compute per job via `RESOURCE_REQUESTS` / `RESOURCE_LIMITS` ([Configuration](/environment-setup/configuration#resource-limits-for-training-jobs)).
+- Day-2 upgrades and rollbacks: see [Operations](/environment-setup/operations).
diff --git a/environment-setup/deploy-local.mdx b/environment-setup/deploy-local.mdx
new file mode 100644
index 0000000..2fa8e21
--- /dev/null
+++ b/environment-setup/deploy-local.mdx
@@ -0,0 +1,55 @@
+---
+title: "Local / k3d"
+description: "Run a tracebloc workspace on a single machine — laptop or on-prem server. Production-capable."
+---
+
+**When to pick it** — A single machine you own: a laptop to try things, or an on-prem server you run in production. The installer brings up a self-contained Kubernetes cluster (k3d) inside Docker — you don't need a cluster of your own.
+
+## Prerequisites
+
+- A machine: macOS, Linux, or Windows · 2 CPU · 4 GB RAM · 20 GB free disk.
+- Your **Client ID** and password from the [clients page](https://ai.tracebloc.io/clients).
+
+That's it — no Docker or Kubernetes knowledge needed. The installer sets up Docker and the cluster for you.
+
+## Install
+
+
+
+ ```bash
+ bash <(curl -fsSL https://tracebloc.io/i.sh)
+ ```
+
+
+ ```powershell
+ irm https://tracebloc.io/i.ps1 | iex
+ ```
+
+
+
+See [Quick Start](/environment-setup/quickstart) for the full walkthrough, including the inspect-first option.
+
+## Verify
+
+```bash
+kubectl get pods -A
+```
+
+The tracebloc pods should be `Running`, and your workspace should read **Online** on the clients page.
+
+## Environment-specific config
+
+Set these as environment variables before the install command (full list in [Configuration](/environment-setup/configuration#installer-options)):
+
+```bash
+CLUSTER_NAME=my-cluster AGENTS=2 HOST_DATA_DIR=/data/tracebloc bash <(curl -fsSL https://tracebloc.io/i.sh)
+```
+
+GPUs are auto-detected on Linux (NVIDIA/AMD) — drivers, container toolkit, and device plugin are installed for you.
+
+## Production notes
+
+- **Local is production-capable.** Point it at a server rather than a laptop and it's a real deployment.
+- Your data persists in `HOST_DATA_DIR` across stop/start cycles (see [Operations](/environment-setup/operations)).
+- Need more headroom? Re-run the installer on a bigger machine, or add worker nodes with `AGENTS`.
+- For multi-node high availability, use [bare-metal](/environment-setup/deploy-bare-metal) or a managed cloud instead.
diff --git a/environment-setup/deploy-openshift.mdx b/environment-setup/deploy-openshift.mdx
new file mode 100644
index 0000000..e594f0c
--- /dev/null
+++ b/environment-setup/deploy-openshift.mdx
@@ -0,0 +1,58 @@
+---
+title: "OpenShift"
+description: "Deploy a tracebloc workspace on Red Hat OpenShift or OKD."
+---
+
+**When to pick it** — You run Red Hat OpenShift (or OKD) and need the workspace to fit its security model (SCCs, OVN networking).
+
+## Prerequisites
+
+- An OpenShift cluster, and `oc` / `kubectl` access.
+- **Helm 3.x**.
+- Your **Client ID** and password from the [clients page](https://ai.tracebloc.io/clients).
+
+## Install
+
+```bash
+helm repo add tracebloc https://tracebloc.github.io/client
+helm repo update
+helm show values tracebloc/client > values.yaml # edit per below
+helm upgrade --install tracebloc tracebloc/client \
+ -n tracebloc --create-namespace -f values.yaml
+```
+
+Set your credentials in `values.yaml` (see [Configuration → Authentication](/environment-setup/configuration#authentication)).
+
+## Verify
+
+```bash
+oc get pods -n tracebloc
+```
+
+Pods `Running`, your workspace **Online** on the clients page.
+
+## Environment-specific config
+
+```yaml
+storageClass:
+ create: false
+ name: ocs-storagecluster-cephfs
+clusterScope: false
+openshift:
+ scc:
+ enabled: true # SCC for the privileged resource-monitor
+networkPolicy:
+ training:
+ enabled: true
+ dnsNamespace: openshift-dns
+ dnsSelector:
+ dns.operator.openshift.io/daemonset-dns: default
+```
+
+- OVN-Kubernetes **enforces** NetworkPolicy by default, so the training-pod egress lockdown works out of the box.
+- `metrics-server` is present on OpenShift.
+
+## Production notes
+
+- The bundled SCC grants the resource-monitor the host access it needs — review it against your cluster policy.
+- Size training compute per job; day-2 management is in [Operations](/environment-setup/operations).
diff --git a/environment-setup/deployment-environments.mdx b/environment-setup/deployment-environments.mdx
new file mode 100644
index 0000000..d3fc419
--- /dev/null
+++ b/environment-setup/deployment-environments.mdx
@@ -0,0 +1,28 @@
+---
+title: "Deployment environments"
+description: "Run tracebloc anywhere — local, bare-metal, EKS, AKS, or OpenShift. Same chart, same steps, your choice of infrastructure."
+---
+
+tracebloc runs the same way everywhere: one chart (`tracebloc/client`), one set of steps, your choice of infrastructure. **Local is a first-class production option** — a workspace on an on-prem server is every bit as real as one on a managed cloud cluster.
+
+## Pick your environment
+
+| Environment | Runs on | GPU | Best when |
+|---|---|---|---|
+| [Local / k3d](/environment-setup/deploy-local) | One machine you own | NVIDIA / AMD, auto-detected | A laptop or a single on-prem server — the fastest start |
+| [Bare-metal](/environment-setup/deploy-bare-metal) | Your own Kubernetes cluster | Your nodes | You already run on-prem Kubernetes |
+| [Amazon EKS](/environment-setup/eks-client-deployment-guide) | AWS (managed) | GPU nodegroups | You're on AWS and want managed, autoscaling compute |
+| [Azure AKS](/environment-setup/deploy-aks) | Azure (managed) | GPU node pools | You're on Azure |
+| [OpenShift](/environment-setup/deploy-openshift) | OpenShift / OKD | Your nodes | You run Red Hat OpenShift |
+
+
+Just want it running fast on one machine? The [Quick Start](/environment-setup/quickstart) one-liner is the local path with zero configuration.
+
+
+## Every environment, same shape
+
+Each guide follows the same six headings, so you always know where to look:
+
+**When to pick it · Prerequisites · Install · Verify · Environment-specific config · Production notes.**
+
+Adding a new target later (GKE, on-prem k3s, …) means filling the same template. All environments deploy the same chart — see [Configuration](/environment-setup/configuration) for every value, and [Operations](/environment-setup/operations) for day-2 management.
diff --git a/environment-setup/eks-client-deployment-guide.mdx b/environment-setup/eks-client-deployment-guide.mdx
index bde8421..140bb7a 100644
--- a/environment-setup/eks-client-deployment-guide.mdx
+++ b/environment-setup/eks-client-deployment-guide.mdx
@@ -5,12 +5,15 @@ description: "Step-by-step guide to deploy Tracebloc on Amazon EKS. Build a prod
## Overview
+
+**Use EKS for production** — multi-node, autoscaling, or shared GPU clusters on AWS. For a single machine (a laptop or one server), the [local installer](/environment-setup/setup-guide) is simpler and faster.
+
Running machine learning workloads in the cloud often requires a reliable, secure, and scalable infrastructure—yet setting it up can be complex. This guide walks you through building a complete Amazon EKS (Elastic Kubernetes Service) environment from scratch using the AWS CLI. By following these steps, you'll create a production-ready foundation with networking, GPU-optional compute, storage, and security fully aligned with AWS and Kubernetes best practices.
Once the infrastructure is in place, you'll deploy and configure the tracebloc client to securely train and benchmark AI models. This setup ensures that your proprietary data stays within your environment, while still allowing external AI models to be tested and fine-tuned in a controlled, isolated way. The result: a scalable, secure platform for high-performance ML workloads that accelerates collaboration with external experts while maintaining full control over your data and IP.
-The entire setup can be completed in about 1–2 hours.
+The entire setup takes ~1–2 hours.
If the cluster is already up and you are just adding another client to it, skip the cluster-creation steps and go straight to ["Client Configuration"](#5-client-configuration).
@@ -44,7 +47,7 @@ aws configure set region eu-central-1
```
#### Required Permissions
-Your AWS user/role should have permissions for:
+Requires permissions for:
- Amazon EKS cluster management
- VPC and networking resources
- EC2 instances and security groups
@@ -137,6 +140,8 @@ Together, these measures ensure that external models can be deployed safely into
## Quick Setup
+Quick Setup runs an automated script that builds the whole cluster in one go. Want step-by-step control (or to customize networking)? Use [Detailed Setup](#detailed-setup) instead.
+
### Purpose
Spin up a production-ready EKS baseline (VPC, subnets, internet gateway, EKS cluster, managed nodegroup, EFS + CSI driver) in one go. Includes basic validation, colored logging, and a cleanup mode.
@@ -167,7 +172,7 @@ Run `./setup_eks.sh cleanup` to remove cluster, nodegroup, EFS, subnets, gateway
- **Costs**: This creates billable resources (EC2, EKS, EFS, data transfer). Remove when not needed.
- **Network model**: Subnets are configured to auto-assign public IPs for simplicity. Adjust to private subnets + NAT as needed.
- **Kubernetes version**: The script requests `--kubernetes-version 1.32`; update if your region/account supports a different current version.
-- **Security hardening**: Treat this as a solid baseline; adapt SGs, private subnets, IRSA, and PodSecurity/OPA as required by your environment.
+- **Security hardening**: This is a production baseline; harden further for your environment (security groups, private subnets, IRSA, Pod Security/OPA).
If you prefer more control over your setup and want to customize the environment to your needs, follow the step-by-step guide below.
@@ -454,7 +459,7 @@ Creates a nodegroup with `t3.medium` instances (2 vCPUs, 4 GiB memory) spread ac
#### Training Nodegroup
-This group runs your ML training workloads and **must be sized appropriately** to provide sufficient memory and compute. Consider dataset size, model type, the number of parallel workloads and whether GPU acceleration is needed. Select instance types and scaling parameters carefully, based on the kind of models you expect to train and the resources they demand.
+This group runs your ML training workloads — size it for your dataset, model type, number of parallel workloads, and whether you need GPUs.
Refer to the [EC2 instance types list](https://aws.amazon.com/ec2/instance-types) and [EKS managed nodegroups docs](https://docs.aws.amazon.com/eks/latest/userguide/create-managed-node-group.html) for guidance.
diff --git a/environment-setup/operations.mdx b/environment-setup/operations.mdx
new file mode 100644
index 0000000..e256445
--- /dev/null
+++ b/environment-setup/operations.mdx
@@ -0,0 +1,81 @@
+---
+title: "Operations"
+description: "Run, monitor, upgrade, and maintain a tracebloc workspace day to day."
+---
+
+Everything you do *after* your workspace is running. Commands assume the default namespace `tracebloc` — substitute yours if you changed it.
+
+## Which version am I on?
+
+```bash
+helm list -n tracebloc # CHART column shows client-
+```
+
+The install summary also prints the version, and `--diagnose` reports it on its first line.
+
+## Health & status
+
+```bash
+kubectl get pods -n tracebloc # all workspace pods Running?
+kubectl get pods -n tracebloc-node-agents # the resource-monitor DaemonSet
+```
+
+Then check your [clients page](https://ai.tracebloc.io/clients) — your workspace should read **Online**.
+
+## Logs
+
+```bash
+kubectl logs -n tracebloc -l app=manager --tail=200 -f
+```
+
+## Stop & start (local / k3d)
+
+Free up your machine without losing anything — data persists between stops.
+
+```bash
+k3d cluster stop tracebloc # frees CPU/RAM
+k3d cluster start tracebloc # resume where you left off
+```
+
+## Upgrade
+
+The auto-upgrade CronJob keeps your workspace current by default. To upgrade manually:
+
+```bash
+helm repo update
+helm upgrade tracebloc tracebloc/client -n tracebloc --reset-then-reuse-values
+```
+
+`--reset-then-reuse-values` preserves the values the installer applied. Append `--version ` to pin a specific release.
+
+## Roll back
+
+```bash
+helm history tracebloc -n tracebloc # find the revision to return to
+helm rollback tracebloc -n tracebloc
+```
+
+## Move to another machine
+
+The client's identity is its **Client ID**, not the machine. To relocate: run the installer on the new host with the **same Client ID**, then re-ingest your datasets (or copy `~/.tracebloc`). The old host can be uninstalled once the new one shows **Online**.
+
+## Uninstall
+
+
+
+ ```bash
+ k3d cluster delete tracebloc # removes the cluster and your workspace
+ ```
+
+
+ ```bash
+ helm uninstall tracebloc -n tracebloc
+ ```
+
+
+
+PVCs are annotated `helm.sh/resource-policy: keep`, so your data survives an uninstall. To remove it too: `kubectl delete pvc --all -n tracebloc`.
+
+## Back up
+
+Your data lives in the data PVCs (or `~/.tracebloc` on a local install). Back up that directory / those volumes on your normal schedule — tracebloc keeps nothing of yours off your infrastructure.
diff --git a/environment-setup/overview.mdx b/environment-setup/overview.mdx
new file mode 100644
index 0000000..fc9c5f4
--- /dev/null
+++ b/environment-setup/overview.mdx
@@ -0,0 +1,93 @@
+---
+title: "Overview"
+description: "How tracebloc runs on your infrastructure — and why your data never leaves it."
+---
+
+**Models come to your data — your data never leaves your infrastructure.** You run a tracebloc *workspace* on your own hardware. Contributors submit models that are scanned and run in isolation against your data, on your machines. Only results leave — and trained model weights only if you choose to share them.
+
+## The trust boundary
+
+```mermaid
+flowchart LR
+ subgraph YOURS["Your infrastructure"]
+ DATA["Your datasets — raw data"]
+ CLIENT["tracebloc workspace"]
+ TRAIN["Isolated training"]
+ DATA --> TRAIN
+ CLIENT --> TRAIN
+ end
+ CONTRIB["Contributors"] -->|submit models| PLATFORM["tracebloc platform"]
+ PLATFORM -->|scanned models in| CLIENT
+ CLIENT -->|"results out — TLS, outbound-only"| PLATFORM
+ style YOURS fill:#f2fafc,stroke:#0184A3,stroke-width:2px
+ style DATA fill:#e6f6fb,stroke:#0184A3
+```
+
+Raw data stays inside your infrastructure. Your workspace opens an **outbound-only** connection — nothing reaches in to pull your data out.
+
+## How it works
+
+
+
+ One command sets up your private workspace on your own infrastructure — a laptop, an on-prem server, or a cloud cluster.
+
+
+ Stage your training and test data locally. Metadata syncs to the platform so contributors can see what's available — **the raw data never moves.**
+
+
+ Define a use case from your datasets and set how submitted models are evaluated.
+
+
+ Whitelist contributors by email. Only the people you invite can take part.
+
+
+ Each model is **scanned for vulnerabilities**, then trains against your data in an **isolated container**, on your hardware.
+
+
+ Training and evaluation results flow back to you over TLS. Trained model weights are shared **only if you choose to** — you control that in the admin panel.
+
+
+
+## What stays, what leaves
+
+| Data | Shared with the platform? |
+|---|---|
+| Raw training & test data | **Never** — it stays on your infrastructure |
+| Dataset metadata (schema, row counts) | Yes — so contributors know what's available |
+| Training & evaluation results | Yes — the metrics models are judged on |
+| Trained model weights | **Only if you allow it** — your choice per collaboration, set in the admin panel |
+
+Enforced by per-job container isolation, a NetworkPolicy that blocks data egress from training pods, a vulnerability scan before any model runs, and TLS on all traffic.
+
+
+**The mental model:** `1 machine = 1 workspace = n datasets`. One deployment per machine; as many datasets inside it as you like.
+
+
+## Vocabulary
+
+| Term | What it is |
+|---|---|
+| **Workspace** | Your private, dedicated AI environment — tracebloc's software running on your own infrastructure, where you invite contributors to train models on your data |
+| **Client ID** | The credential that connects your workspace to the platform (created on the clients page) |
+| **Dataset** | Data you've staged locally for use cases |
+| **Use case** | A task contributors build models for, against your datasets |
+| **Contributor** | An external data scientist who submits models — and never sees your raw data |
+
+## What it touches
+
+The installer changes only **Docker** and **`~/.tracebloc`** on your host — no system-wide changes. Uninstalling is a single command, and your data is yours throughout.
+
+## Get started
+
+
+
+ A running workspace in about 10 minutes, with one command.
+
+
+ Deploy on local / k3d, bare-metal, EKS, AKS, or OpenShift.
+
+
+
+
+Want the guarantees in detail for your security team? See [Security & data handling](/environment-setup/configuration#security).
+
diff --git a/environment-setup/quickstart.mdx b/environment-setup/quickstart.mdx
new file mode 100644
index 0000000..07d2087
--- /dev/null
+++ b/environment-setup/quickstart.mdx
@@ -0,0 +1,81 @@
+---
+title: "Quick Start"
+description: "From zero to a running tracebloc workspace in about 10 minutes."
+---
+
+A running workspace in about 10 minutes, with one command. Deploying on a specific environment instead (EKS, AKS, bare-metal, OpenShift)? See [Deployment environments](/environment-setup/deployment-environments).
+
+
+**No Docker or Kubernetes knowledge needed.** The installer sets up the whole container stack for you — on macOS and Linux it even installs Docker if it's missing. You just need a machine.
+
+
+## Before you start
+
+| You need | Minimum |
+|---|---|
+| A machine | macOS, Linux, or Windows · 2 CPU · 4 GB RAM · 20 GB free disk |
+| A tracebloc account | [Sign up free](https://ai.tracebloc.io) — no credit card |
+
+The installer runs below these too — it only warns, and more RAM mainly helps once models train.
+
+
+
+ On the [clients page](https://ai.tracebloc.io/clients), click **+** and note your **Client ID** and **password** — you'll enter them during install.
+
+
+
+
+
+ ```bash
+ bash <(curl -fsSL https://tracebloc.io/i.sh)
+ ```
+
+
+ ```powershell
+ irm https://tracebloc.io/i.ps1 | iex
+ ```
+
+
+
+ **What to expect:** about 5 minutes. It asks for your Client ID and password, installs Docker (if missing) and a local Kubernetes cluster, and touches only `~/.tracebloc` and Docker. Safe to re-run anytime.
+
+
+ Every install script is open source. Download, inspect, then run:
+
+ ```bash
+ curl -fsSL https://tracebloc.io/i.sh -o install.sh
+ less install.sh # review it
+ bash install.sh
+ ```
+
+ Source: [github.com/tracebloc/client](https://github.com/tracebloc/client/blob/main/scripts/install.sh). Release binaries are cosign-signed, so you can verify their signature before trusting them.
+
+
+
+
+ ```bash
+ kubectl get pods -A
+ ```
+
+ Look for the tracebloc pods (`mysql-client`, `…-jobs-manager`, `…-requests-proxy`) in `Running` state. Then open your [clients page](https://ai.tracebloc.io/clients) — your workspace should show **Online**.
+
+
+
+
+**What changed on your machine** — only Docker and `~/.tracebloc`. To remove everything: `k3d cluster delete tracebloc`.
+
+
+## Locked-down environment?
+
+If your security policy forbids piping scripts from the internet, skip the one-liner and [deploy with Helm](/environment-setup/configuration#manual-deployment) instead — same result, full control over every step.
+
+## What's next
+
+
+
+ Ingest data so contributors can build models against it.
+
+
+ Define a task and invite contributors.
+
+
diff --git a/environment-setup/security.mdx b/environment-setup/security.mdx
new file mode 100644
index 0000000..4aacded
--- /dev/null
+++ b/environment-setup/security.mdx
@@ -0,0 +1,37 @@
+---
+title: "Security & data handling"
+description: "What stays on your infrastructure, what leaves, and how tracebloc enforces it — the page to share with your security team."
+---
+
+tracebloc is built so your data never has to leave your network. This page is the summary to hand to your security or compliance team.
+
+## What's shared, what isn't
+
+| Data | Shared with the platform? |
+|---|---|
+| Raw training & test data | **Never** — it stays on your infrastructure |
+| Dataset metadata (schema, row counts) | Yes — so contributors know what's available |
+| Training & evaluation results | Yes — the metrics models are judged on |
+| Trained model weights | **Only if you allow it** — your choice per collaboration, set in the admin panel |
+
+## How it's enforced
+
+- **Data locality.** Training runs against your data on your hardware. Raw data never crosses the boundary.
+- **Isolation.** Each training job runs in its own container with restricted system access; Kubernetes namespaces separate workloads.
+- **Network policy.** Training pods are denied data egress — they can't reach MySQL, other pods, or the Kubernetes API.
+- **Model scanning.** Submitted models are scanned for vulnerabilities (Bandit) before anything executes.
+- **Encryption in transit.** All workspace ↔ platform traffic is TLS, on an **outbound-only** connection.
+- **Access control.** Only contributors you whitelist by email can join a use case.
+- **Minimal footprint.** The installer touches only Docker and `~/.tracebloc` — no system-wide changes.
+
+## You control what leaves
+
+Trained weights are shared only when you choose to share them. Whom you collaborate with, and whether weights are downloadable, is set in the admin panel — per use case.
+
+## Support bundles are redacted
+
+If support asks for diagnostics, `--diagnose` produces a bundle with **credentials removed** (passwords, tokens, and proxy secrets stripped before the archive is written). See [Troubleshooting](/environment-setup/troubleshooting).
+
+## Outbound access
+
+Your workspace needs outbound HTTPS to: `*.docker.io`, `ghcr.io`, `raw.githubusercontent.com`, `*.github.io`, `*.tracebloc.io`, and `pypi.org`. Nothing needs to reach *in*.
diff --git a/environment-setup/setup-guide.mdx b/environment-setup/setup-guide.mdx
index 9ba80e6..a9eb9bb 100644
--- a/environment-setup/setup-guide.mdx
+++ b/environment-setup/setup-guide.mdx
@@ -17,13 +17,13 @@ The installer runs on any modern machine (one host per workspace). These are the
| | Minimum | Recommended |
|---|---------|-------------|
-| **CPU** | 4 cores | 8+ cores |
-| **RAM** | 8 GB | 16+ GB |
+| **CPU** | 2 cores | 8+ cores |
+| **RAM** | 4 GB | 16+ GB |
| **Disk** | 20 GB free | 50+ GB free |
**Supported platforms:** macOS (Intel & Apple Silicon) · Linux (x86_64 & arm64) · Windows (x86_64 & arm64)
-**Outbound access needed:** The installer downloads container images and connects to the tracebloc platform. Make sure your network allows traffic to `*.docker.io`, `*.tracebloc.io`, `github.com`, and `pypi.org`.
+**Outbound access needed:** The installer pulls container images, the install scripts, and the Helm chart, then connects to the tracebloc platform. Allow traffic to `*.docker.io`, `ghcr.io`, `raw.githubusercontent.com`, `*.github.io`, `*.tracebloc.io`, and `pypi.org`.
---
@@ -123,7 +123,7 @@ To upgrade a one-liner install later, run `helm upgrade tracebloc/cl
### GPU Support
-The installer auto-detects GPU hardware and configures the cluster accordingly:
+The installer detects your GPU and configures the cluster:
- **Linux (NVIDIA/AMD)** — drivers, container toolkit, and Kubernetes device plugin are installed automatically. A reboot may be required after driver installation.
- **macOS** — CPU-only. For GPU workloads, deploy on a Linux machine or use [AWS (EKS)](/environment-setup/eks-client-deployment-guide).
diff --git a/environment-setup/troubleshooting.mdx b/environment-setup/troubleshooting.mdx
index 2f5ac9c..f2fd0ce 100644
--- a/environment-setup/troubleshooting.mdx
+++ b/environment-setup/troubleshooting.mdx
@@ -7,12 +7,22 @@ Most issues fall into a few categories: pods not starting, client not connecting
For real-time cluster monitoring, try [k9s](https://k9scli.io/) — run `k9s -n ` to get a live view of pods, logs, and events.
+
+**Stuck? Generate a support bundle.** Re-run the installer with `--diagnose`:
+
+```bash
+bash <(curl -fsSL https://tracebloc.io/i.sh) --diagnose
+```
+
+It writes a redacted `~/.tracebloc/tracebloc-diagnose-.tgz` — logs, pod status, and versions with **credentials removed** — that you can send to support. The first line of output shows your client version.
+
+
## Quick Checks
| Symptom | Check | Fix |
|---------|-------|-----|
| Pods not starting | `kubectl describe pod -n ` | Check resource limits, Docker status |
-| Client shows Offline | `kubectl logs -n -l app=tracebloc-jobs-manager` | Verify client ID/password, check network |
+| Client shows Offline | `kubectl logs -n -l app=manager` | Verify client ID/password, check network |
| Docker not running | `docker info` | Start Docker Desktop or daemon |
| Cluster not found | `k3d cluster list` | Re-run the installer |
| GPU not detected | `nvidia-smi` | Install NVIDIA drivers, reboot, re-run installer |
diff --git a/robots.txt b/robots.txt
new file mode 100644
index 0000000..a2a4eae
--- /dev/null
+++ b/robots.txt
@@ -0,0 +1,5 @@
+User-agent: *
+Disallow: /cdn-cgi/
+Disallow: /mintlify-assets/
+
+Sitemap: https://docs.tracebloc.io/sitemap.xml