Block bots from crawling Mintlify static assets#45
Merged
divyasinghds merged 5 commits intoMay 29, 2026
Conversation
Prod: enhance documentation and CI for training workflows and templates
* docs: consolidate Docker build into single multi-arch command (#35) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: remove open-source client claim from how-training-works (#36) * docs: remove open-source client claim from how-training-works Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: say "contact us" instead of "open a support ticket" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Asad Iqbal (Saadi) <asad.dsoft@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add automated upstream sync workflow (#27) * docs: add automated upstream sync workflow Adds a Claude-powered workflow that syncs docs pages with upstream README changes from five source repos (tracebloc-py-package, client, start-training, data-ingestors, model-zoo). Source repos fire repository_dispatch on push; this repo's workflow fetches the upstream file, has Claude rewrite the target .mdx in docs voice, and opens a PR. - .github/sync-sources.yml: mapping of upstream files to docs pages - .github/workflows/sync-docs.yml: dispatch + manual + cron-driven sync job - .github/notify-docs.workflow-template.yml: template for source repos Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address bugbot issues in sync workflow - Pass ANTHROPIC_API_KEY as anthropic_api_key input to claude-code-action instead of env var (action reads via core.getInput, not env). - Move sync cache from .sync-cache/ to /tmp/sync-cache/ so untracked cache files are not picked up by create-pull-request. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: address remaining bugbot issues in sync workflow - Add concurrency group so overlapping cron/dispatch/manual runs serialize instead of racing on the docs/sync-upstream branch (would otherwise fail with "failed to push some refs" and drop changes from the losing run). - Pin yq to v4.44.3 instead of latest for deterministic builds. - Restrict create-pull-request add-paths to **/*.mdx so stray edits outside docs pages cannot be staged into the sync PR. - Note in the notify template that branches may need adjusting for repos using master (e.g. data-ingestors). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: accumulate sync runs onto existing PR branch Previously each run checked out the default branch fresh and force-pushed only the dispatched source's diff to docs/sync-upstream, silently overwriting any earlier dispatched sources' pending changes. Now the workflow: - Checks if docs/sync-upstream exists on the remote; if so, checks it out so prior accumulated changes are part of the working tree. - Resolves the default branch dynamically and passes it to peter-evans as the explicit base so the PR continues targeting the right branch even after we switched off it. Result: sequential dispatches for different sources combine into one PR instead of clobbering each other. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: read sync-sources.yml from base branch, not stale sync branch After the previous fix switched the working tree to docs/sync-upstream to accumulate changes, all subsequent reads of .github/sync-sources.yml were coming from the (potentially stale) sync branch instead of the base branch. If a new source were added or an instruction edited on main while a sync PR was pending, the workflow would silently use the outdated config. Snapshot the mapping to /tmp/sync-sources.yml before any branch switch, and point both the yq filter step and the Claude prompt at the snapshot. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Asad Iqbal <asad.dsoft@gmail.com> * docs: migrate join-use-case API examples to snake_case (tracebloc_package 0.7.0) (#14) Sync hyperparameters and start-training pages with the legacy tracebloc/documentation repo: - Rename camelCase API methods to snake_case: upload_model, link_model_dataset, experiment_name, get_training_plan, learning_rate, loss_function, layers_freeze, early_stop_callback, reduce_lr_callback, model_checkpoint_callback, terminate_on_nan_callback, training_classes, data_type - Rename trainingObject → training - Update terminate-on-NaN description (any NaN loss) - Use pip optional-extras syntax: tracebloc_package[pytorch|tensorflow|sklearn|all] Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: migrate SDK page to tracebloc 0.8.x (#39) * docs: migrate SDK page to tracebloc 0.8.x (closes #38) The SDK was renamed in tracebloc/tracebloc-py-package#135. `tracebloc==0.8.1` is live on PyPI. Migrating Mintlify docs to the canonical name. - Rename `tools-help/tracebloc-package.mdx` -> `tools-help/tracebloc.mdx`. - Rewrite the page: `tracebloc` install + import, snake_case API (post-SDK.2), historical Note about the rename, link to redirect package on PyPI. - Bump install pin to `>=0.8.0` (was `>=0.6.32`); add per-extra install options. - `docs.json`: - Nav: `tools-help/tracebloc-package` -> `tools-help/tracebloc`. - Add `/tools-help/tracebloc-package` -> `/tools-help/tracebloc` redirect to preserve old inbound links. - Existing redirects pointing at `/tools-help/tracebloc-package` now point at `/tools-help/tracebloc`. - Internal cross-links in faqs.mdx + key-terms.mdx -> new URL. - `join-use-case/start-training.mdx` install snippet -> new name + pin. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: bump install pin to 0.8.1 (latest) * docs: migrate SDK examples to snake_case API (post-SDK.2) The 0.7.0 SDK.2 release renamed the public Python API to PEP 8 / snake_case. The old camelCase forms still work via deprecation aliases (with DeprecationWarning) but new examples should use the canonical names. Updates the three customer-facing pages that still showed the camelCase API: - `join-use-case/start-training.mdx` — the main walk-through. - `join-use-case/hyperparameters.mdx` — the full reference table. - `join-use-case/model-optimization.mdx` — pretrained-weights upload. Method renames applied (per tracebloc-py-package/MIGRATION.md): - `uploadModel` -> `upload_model` (+ `model_name=` kwarg) - `linkModelDataset` -> `link_model_dataset` (+ `dataset_id=` kwarg) - `getTrainingPlan` -> `get_training_plan` - `experimentName` -> `experiment_name` - `learningRate` -> `learning_rate` - `lossFunction` -> `loss_function` - `layersFreeze` -> `layers_freeze` - `earlystopCallback` -> `early_stop_callback` - `reducelrCallback` -> `reduce_lr_callback` - `modelCheckpointCallback` -> `model_checkpoint_callback` - `terminateOnNaNCallback` -> `terminate_on_nan_callback` - `trainingClasses` -> `training_classes` - `dataType` -> `data_type` The `model_name` and `dataset_id` keyword names are no longer aliased in 0.8.x — passing positional args still works, but the kwargs `modelname=` / `datasetId=` raise TypeError, so the docs use the explicit kwarg form everyone should adopt. Also renames the local variable `trainingObject` -> `training_plan` throughout, matching the canonical sample workflow in tracebloc's project CLAUDE.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ci): update sync-sources.yml dest after file rename (bugbot) `tools-help/tracebloc-package.mdx` was renamed to `tools-help/tracebloc.mdx` earlier in this PR, but the daily `sync-docs.yml` cron reads `.github/sync-sources.yml` and would have either recreated the old orphan path or failed outright — silently preventing upstream README edits from reaching the new page. Repointing the dest at `tools-help/tracebloc.mdx` keeps the upstream README -> docs page sync working. The mapping `id` stays `tracebloc-package` (it's a slug used for dispatch; changing it would need a coordinated edit in the upstream notify workflow, which doesn't exist yet — scope creep here). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> * fix(ci): correct upstream refs in sync-sources.yml Three entries pointed at `main` branches that do not exist in the upstream repos, which would cause the sync fetch step to 404: - tracebloc-py-package → develop (default; `main` does not exist; per the SDK repo's CLAUDE.md, develop is the canonical source of truth) - data-ingestors → master (default branch) - model-zoo → master (default branch) Verified against the GitHub API for each repo. The `Readme.md` casing flagged by bugbot is correct as-is: data-ingestors actually ships `Readme.md` (mixed case), so the bugbot suggestion would have broken the fetch — left unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ci): pass source_id to yq via strenv() to avoid query injection Bugbot flagged the previous `yq ".sources[] | select(.id == \"$target\")"` pattern as shell-injectable. The specific RCE described doesn't actually trigger — `DISPATCH_ID` / `INPUT_ID` are routed through `env:` (Actions best practice) and bash does not re-tokenize variable values inside double quotes, so `$()`, backticks, and `;` in the value remain literal. However, a `"` in the value would still terminate the yq string literal at the yq parser level and could yield a malformed query or unintended filter. Routing the value through `strenv(TARGET)` keeps it entirely out of the yq expression syntax — defense in depth at zero cost. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Asad Iqbal (Saadi) <asad.dsoft@gmail.com>
docs: clarify setup guide deploys single-node workspace
…43) (#44) Two changes to the prepare-data and setup-guide pages driven by user feedback after a fresh end-to-end setup: - prepare-dataset.mdx: lead with the declarative YAML method (helm install tracebloc/ingestor --set-file ingestConfig=./ingest.yaml). The existing Python-template + Docker + kubectl flow stays as the advanced path for users who need custom processors. Calls out that ingest.yaml fields vary per category and points at the per-category examples in the data-ingestors repo. - setup-guide.mdx: add a Note after the curl one-liner pointing at the helm upgrade command (--reset-then-reuse-values, --version) so users know how to upgrade an installer-deployed client without losing applied values. Co-authored-by: Asad Iqbal (Saadi) <asad.dsoft@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-applied on top of current main (original branch fix/robots-txt-block-static-assets was cut from an old initial-commit state and rebasing produced unrelated conflicts in favicon/logo/.mintignore/docs.json). Clarity data shows bots (Apple, OpenAI, Google) spending ~200 requests/week on /mintlify-assets/_next/static/ JS/CSS chunks. These have zero SEO value. Adds custom robots.txt that blocks /mintlify-assets/ while keeping the existing /cdn-cgi/ block and sitemap reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
Contributor
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
Contributor
|
👋 Heads-up — Code review queue is at 9 / 8 Above the WIP limit. The team convention is to review existing PRs before opening new work. Open PRs currently in Code review (oldest first):
Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-opens #3 from a fresh branch off current
main. The original branch was cut from an old "Initial commit" state and rebasing produced unrelated conflicts (favicon.svg, logo/light.svg, .mintignore, docs.json). The link-rot check on #3 was failing on stale broken refs (getting-started/quick-setup,images/tracebloc-workflow-overview.png) that no longer exist onmain.Summary
Adds a custom robots.txt to block bots from crawling Mintlify static asset bundles.
Context
Clarity bot traffic data (Apr 10-16) shows ~200 bot requests/week hitting
/mintlify-assets/_next/static/JS/CSS chunks. These have zero SEO value and waste crawl budget.Changes
New
robots.txtfile:/mintlify-assets/(JS/CSS/font bundles)/cdn-cgi/blockTest plan
docs.tracebloc.io/robots.txtreturns updated rules after deploySupersedes #3 (approved by @saadqbal x2).
🤖 Generated with Claude Code
Note
Low Risk
Docs-only crawl policy; no application code, auth, or data handling changes.
Overview
Adds a custom
robots.txtfor the docs site so crawlers skip low-value static paths while documentation pages remain open toUser-agent: *.New disallow rules:
/cdn-cgi/(unchanged intent from prior setup) and/mintlify-assets/(Next/Mintlify JS, CSS, and font bundles that were drawing bot traffic without SEO benefit).Sitemap still points at
https://docs.tracebloc.io/sitemap.xml.Reviewed by Cursor Bugbot for commit 23dd9aa. Bugbot is set up for automated code reviews on this repo. Configure here.