tool: sync WRB soil descriptions from the mobile i18n into the DB#366
Merged
Conversation
d9e5886 to
c5b1497
Compare
…90_desc The mobile client keeps hand-cleaned, multi-language copies of the WRB soil Description/Management narratives in its i18n files; the backend reads the same narratives from the wrb_fao90_desc table, where the non-English columns are largely English placeholders and the text carries literal <br> tags. The table also stores a translated name per language (wrb_tax_<lang>) and is keyed by wrb_tax (= the English name, which the soil-ID algorithm matches against). scripts/wrb_descriptions_sync.py compares the two (default: a read-only HTML word-diff with <br>/whitespace normalized out) and, with --write, rebuilds the table from the JSON source of truth in one transaction: widens the varchar(2000) description/management columns to text, adds columns for new languages, deletes all rows, and reinserts one row per JSON soil (name + description + management per language). A guard refuses to rebuild if the JSON does not cover every soil the DB or algorithm (HWSD2 fao90_name) still needs. Languages are auto-discovered from the translation files; the only column-suffix mismatch (sw -> ks) lives in DB_SUFFIX_EXCEPTIONS. scripts/README.md documents the tool and a runbook for the planned ks -> sw rename that would remove even that exception. Stdlib (difflib/html) + psycopg only; run where the soil-id DB is reachable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c5b1497 to
7b5b02a
Compare
knipec
approved these changes
Jun 11, 2026
The backend only ever surfaces the English description/management, so the soil-id library now fetches just that: get_WRB_descriptions / getSG_descriptions select WRB_tax, Description_en, Management_en, and global_soil builds an English-only siteDescription. The other languages stay in the table (kept current by wrb_descriptions_sync, staged for a possible future multilingual API) but are no longer read. With the read path decoupled from the non-English columns, the sync drops the sw->ks exception entirely (the DB_SUFFIX_EXCEPTIONS map is gone) and writes Swahili to the correctly ISO-coded description_sw; the obsolete description_ks columns are left in place (now nullable, all NULL) for a window-free DROP later -- see scripts/README.md. Global snapshots regenerated to the English-only siteDescription. The English text still carries <br> here because it matches the current soil-id-db image; it becomes clean once that image is rebuilt from the synced data. (Includes a ruff-pre-commit version bump applied by the repo's update hook.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
06cff74 to
6e7ee55
Compare
johannesparty
added a commit
to techmatters/terraso-backend
that referenced
this pull request
Jun 11, 2026
…ath) Pins soil-id to the tagged release where the WRB description read path is English-only (techmatters/soil-id-algorithm#366): the library fetches only Description_en/Management_en. soil-id's install_requires is unchanged from the prior pin, so only the soil-id URL + archive hash move; the rest of the lock is untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
johannesparty
added a commit
to techmatters/terraso-backend
that referenced
this pull request
Jun 12, 2026
…2053) * feat: expose global soil description & management via API and exports Global (WRB/HWSD) soil matches carry description and management guidance as a multilingual dict in the soil-id output, but resolve_soil_info only forwarded a plain string (the US brief_narrative) and nulled the dict — so global matches returned a null description and no management text. Now: - resolve_soil_info returns the English description for both US (string) and global (Description_en) matches, plus a new management field from the global Management_en (None for US). - normalize_soil_description() strips the <br> tags the global DB stores and collapses whitespace, matching the clients' hand-cleaned copies. - SoilSeries gains a management field; fullDescriptionUrl is annotated US only (it is the SoilWeb series URL, absent for global matches). - The export soil-id query requests management; CSV export gains Selected soil management / Top soil match management columns beside the description ones (JSON carries it automatically via passthrough). CSV snapshot fixtures regenerated with the new columns. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor: drop <br> normalization, return raw English description/management normalize_soil_description stripped <br> tags and collapsed whitespace from the WRB descriptions, because the soil-id DB stored them with literal <br>. We're now replacing those DB strings with the clean mobile-i18n copies (no <br>) via the wrb_descriptions_sync tool, so the normalization is redundant — resolve_soil_info returns Description_en / Management_en (and the US brief_narrative) verbatim. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * build: repin soil-id to 2026-06-11.1 (English-only description read path) Pins soil-id to the tagged release where the WRB description read path is English-only (techmatters/soil-id-algorithm#366): the library fetches only Description_en/Management_en. soil-id's install_requires is unchanged from the prior pin, so only the soil-id URL + archive hash move; the rest of the lock is untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
johannesparty
added a commit
to techmatters/terraso-backend
that referenced
this pull request
Jun 12, 2026
…2053) * feat: expose global soil description & management via API and exports Global (WRB/HWSD) soil matches carry description and management guidance as a multilingual dict in the soil-id output, but resolve_soil_info only forwarded a plain string (the US brief_narrative) and nulled the dict — so global matches returned a null description and no management text. Now: - resolve_soil_info returns the English description for both US (string) and global (Description_en) matches, plus a new management field from the global Management_en (None for US). - normalize_soil_description() strips the <br> tags the global DB stores and collapses whitespace, matching the clients' hand-cleaned copies. - SoilSeries gains a management field; fullDescriptionUrl is annotated US only (it is the SoilWeb series URL, absent for global matches). - The export soil-id query requests management; CSV export gains Selected soil management / Top soil match management columns beside the description ones (JSON carries it automatically via passthrough). CSV snapshot fixtures regenerated with the new columns. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor: drop <br> normalization, return raw English description/management normalize_soil_description stripped <br> tags and collapsed whitespace from the WRB descriptions, because the soil-id DB stored them with literal <br>. We're now replacing those DB strings with the clean mobile-i18n copies (no <br>) via the wrb_descriptions_sync tool, so the normalization is redundant — resolve_soil_info returns Description_en / Management_en (and the US brief_narrative) verbatim. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * build: repin soil-id to 2026-06-11.1 (English-only description read path) Pins soil-id to the tagged release where the WRB description read path is English-only (techmatters/soil-id-algorithm#366): the library fetches only Description_en/Management_en. soil-id's install_requires is unchanged from the prior pin, so only the soil-id URL + archive hash move; the rest of the lock is untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The mobile client keeps hand-cleaned, multi-language copies of the WRB soil Description/Management narratives in its i18n files (
soil.match_info.<key>.{description,management}). The backend reads the same narratives fromwrb_fao90_desc, where — it turns out — the non-English columns (fr,ks) are largely English placeholders,esis partial, and the text carries literal<br>tags.scripts/wrb_descriptions_sync.py:<br>/whitespace normalized out so only genuine wording/translation differences show; introspects which language columns exist.--write: replaces the DB values from the JSON source of truth in one transaction — widens thevarchar(2000)columns totext, adds columns for new languages (ka/uk), rewrites every in-both soil, and emits the applied SQL for review.Stdlib (
difflib/html) +psycopgonly. Run where the soil-id DB is reachable (e.g. inside the backend container).After
--write, regenerate and redistribute thesoil-id-dbdump (make dump_soil_id_db→make build_docker_image→ push).gypsisolsis in the app i18n but has no DB row (skipped); thesw→ksmapping is the existing Kiswahili column.🤖 Generated with Claude Code