Skip to content

docs: make declarative-ingest staging self-contained (data-ingestors#131 B/C)#46

Open
divyasinghds wants to merge 1 commit into
mainfrom
docs/fix-issue-131-declarative-staging
Open

docs: make declarative-ingest staging self-contained (data-ingestors#131 B/C)#46
divyasinghds wants to merge 1 commit into
mainfrom
docs/fix-issue-131-declarative-staging

Conversation

@divyasinghds
Copy link
Copy Markdown
Contributor

@divyasinghds divyasinghds commented Jun 1, 2026

Summary

Companion to tracebloc/data-ingestors#133, addressing the docs-side items from data-ingestors#131.

  • B2 / B1 — single canonical staging recipe. Section 2 of the declarative path described a kubectl cp Pod while the Detailed Setup section (further down on the same page) prescribed a host-path cp -R. Same page, two different staging mechanisms. Replaced section 2 with the inline host-path recipe that matches the Detailed Setup section, and demoted kubectl cp to a Note for multi-node / EKS deployments. The recipe now uses a <prefix> subdirectory so the path lines up with the /data/shared/<prefix>/... style used in the YAML examples.
  • C2 — credentials in the declarative path. Section 4 was silent on where CLIENT_ID / CLIENT_PASSWORD come from. Added a sentence noting the ingestor Pod inherits them from the Kubernetes Secret the parent tracebloc/client chart creates in <workspace> at install time. No creds are passed on the helm install command.
  • C5 — worked train + test example. The run-twice rule was buried in a trailing parenthetical. Promoted it to bolded prose and added a side-by-side example showing two helm install invocations with distinct release names, table:, and intent: values.

Out of scope

  • A-series (column names, PVC paths in example YAMLs) is shipping in data-ingestors#133.
  • B3 (object_id / object_count columns in object_detection README) and C-series polish items C1/C3/C4 still need maintainer attention in the data-ingestors repo.

Test plan

  • mint dev and walk through the declarative section start to finish; the staging recipe, the example YAML, and the train+test install commands should all reference compatible paths and names.
  • mint broken-links clean.

🤖 Generated with Claude Code


Note

Low Risk
Documentation-only edits to prepare-dataset.mdx with no runtime or security behavior changes.

Overview
The declarative Prepare Data path in prepare-dataset.mdx is tightened so reviewers can follow staging → YAML → Helm without bouncing to external READMEs for the default case.

Section 2 (staging) now documents the single-node default inline: copy into ~/.tracebloc/<workspace>/data/<prefix> and reference /data/shared/<prefix>/... in ingest.yaml. kubectl cp / init-container staging is relegated to a Note for multi-node or EKS.

Section 4 (install) leads with the train/test rule, shows two helm install examples (cats-dogs-train / cats-dogs-test with separate config files), and states that CLIENT_ID / CLIENT_PASSWORD come from the parent tracebloc/client Secret—nothing to pass on the Helm CLI.

Reviewed by Cursor Bugbot for commit c7d1118. Bugbot is set up for automated code reviews on this repo. Configure here.

Fixes the docs side of data-ingestors#131:

- B2 / B1: section 2 of the declarative path linked out for the
  staging recipe and described `kubectl cp` while the Detailed Setup
  section (further down on the same page) prescribed a host-path
  `cp -R`. Replaced section 2 with an inline host-path recipe that
  matches the Detailed Setup section, and demoted `kubectl cp` to a
  Note for multi-node / EKS deployments. The recipe now uses a
  `<prefix>` subdirectory so the path lines up with the
  `/data/shared/<prefix>/...` style used in ingest.yaml examples.
- C2: section 4 was silent on where CLIENT_ID / CLIENT_PASSWORD come
  from in the declarative path. Added a sentence noting the ingestor
  Pod inherits them from the Kubernetes Secret the parent
  tracebloc/client chart creates in <workspace> at install time —
  no creds are passed on the `helm install` line.
- C5: section 4 mentioned the run-twice rule only as a trailing
  parenthetical. Promoted it to bolded prose and added a worked
  train + test pair (two `helm install` invocations, distinct
  release names + `table:` + `intent:`) so the rule is concrete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@divyasinghds divyasinghds self-assigned this Jun 1, 2026
@LukasWodka
Copy link
Copy Markdown
Contributor

👋 Heads-up — Code review queue is at 11 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

@mintlify
Copy link
Copy Markdown
Contributor

mintlify Bot commented Jun 1, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
tracebloc 🟢 Ready View Preview Jun 1, 2026, 9:46 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.


## Overview

Make your data available to the Kubernetes cluster so it can be used for training and evaluation. Regardless of where your client runs on Azure, AWS, Google Cloud, or a local Minikube setup, the process of ingesting datasets works the same way.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to "Whether your client runs on Azure, AWS, Google Cloud ...". The current sentence is cumbersome and harder to follow.

The `tracebloc/client` parent chart bootstraps the cluster (jobs-manager, MySQL, RBAC). The `tracebloc/ingestor` subchart submits per-dataset ingestion runs against it.

<Note>
If you installed the client via the one-liner (`bash <(curl -fsSL https://tracebloc.io/i.sh)`), use `--reset-then-reuse-values` so the helm upgrade doesn't drop the values the installer applied:
Copy link
Copy Markdown

@aptracebloc aptracebloc Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not come after we give commands that the user can run and only later find out that there were side effects, like if not setting the flag --reset-then-reuse-values. This part should be at the beginning of the section.
@saadqbal WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants