Skip to content

fix(waterdata): drop id from wire properties so daily/continuous don't 400#323

Draft
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/waterdata-properties-id
Draft

fix(waterdata): drop id from wire properties so daily/continuous don't 400#323
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:fix/waterdata-properties-id

Conversation

@thodson-usgs

Copy link
Copy Markdown
Collaborator

Problem (CI is red on main)

The live daily and continuous integration tests began failing around Jun 11–12, independent of any code change:

  • test_get_daily_properties / _id400: InvalidParameterValue. unknown properties specified
  • test_get_continuousassert 'geometry' not in df.columns (geometry is present)

These are live-API drift, not a regression in the repo (main's last green CI ran Jun 10; the API changed after). They red every PR's ubuntu matrix until fixed.

Root cause

I probed the live API to characterize the change:

collection properties=id,value properties=value
daily 400 200
continuous 400 200
latest-daily 200 200
  1. id is no longer a selectable property on daily/continuous (the feature id is always returned anyway). _switch_properties_id translated the user-facing id column (daily_id) to id and kept it in the wire properties, so requests that listed the id column started 400ing. (latest-daily still tolerates id, which is why the CQL test kept passing.)
  2. Geometry is now included by default when skipGeometry is unset — which the library already documents (skip_geometry=None "leaves skipGeometry unset (the server defaults to including geometry)").

Fix

  • _switch_properties_id now drops every id alias (id, the service id like daily_id, and its singular form) plus geometry from the wire properties, instead of translating to id and keeping it. The feature id always comes back and is renamed to the service id column in _arrange_cols, so it never needs to be requested. This matches the R dataRetrieval package's switch_properties_id (which removes id/geometry).
    • User-facing behavior is unchanged: callers still pass daily_id/id in properties and get the same output columns — only the request the chunker sends is corrected.
  • test_get_continuous: updated the stale geometry assertion to match the documented default (server includes geometry when skip_geometry is unset). The user-facing default is intentionally not changed.

Note on consistency (raised in review)

The R package excludes geometry for continuous by forcing skipGeometry = TRUE in read_waterdata_continuous (high-frequency data → geometry per row is pure bloat). Python's get_continuous instead defaults skip_geometry=None (server default). Aligning Python would be a user-facing default change, which is deliberately out of scope here — flagging it as a possible follow-up enhancement.

Verification

The 3 originally-failing tests pass against the live API, and the related id/geometry tests stay green (test_get_daily_no_geometry, test_get_cql_properties_id_translation, test_get_latest_daily_properties_geometry, …). Full waterdata_test.py (75) plus waterdata_utils/filters/chunking suites pass; ruff and mypy --strict dataretrieval/ clean.

🤖 Generated with Claude Code

…s don't 400

The USGS OGC API now rejects the feature `id` as a selectable `properties`
value on the `daily` and `continuous` collections (HTTP 400
"InvalidParameterValue. unknown properties specified"; `latest-daily` still
accepts it). `_switch_properties_id` translated the user-facing id column
(e.g. `daily_id`) to `id` and *kept* it in the wire `properties` list, so any
call passing the id column in `properties` started failing — breaking
`test_get_daily_properties` and `test_get_daily_properties_id`.

The feature `id` is always returned and is renamed to the service-specific id
column in `_arrange_cols`, so it never needs to be requested. Drop every id
alias (`id`, the service id like `daily_id`, and its singular form) plus
`geometry` (controlled by `skip_geometry`) from the wire list, matching the R
dataRetrieval package's `switch_properties_id`. User-facing behavior is
unchanged: callers still pass `daily_id`/`id` in `properties` and get the same
output columns; only the request the chunker sends is corrected.

Also update the stale `test_get_continuous` geometry assertion: with
`skip_geometry` unset the server now includes geometry by default, which the
library already documents ("the server defaults to including geometry"). The
user-facing default is intentionally left unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
thodson-usgs added a commit to thodson-usgs/dataretrieval-python that referenced this pull request Jun 12, 2026
Ports the NGWMN functions from the R dataRetrieval PR
(DOI-USGS/dataRetrieval#904) and, per review, refactors the Water Data OGC
machinery into a shared engine so NGWMN and Water Data are sibling layers on
top of it rather than NGWMN depending on Water Data.

Architecture
------------
  dataretrieval/ogc/        generic OGC engine (no API-specific config):
    chunking.py             (moved from waterdata/) the multi-value chunker
    filters.py              (moved) cql-text filter splitting
    progress.py             (moved from waterdata/_progress.py)
    engine.py               request build, paginate, parse, finalize, the
                            chunked get_ogc_data entry point, arg handling
  dataretrieval/waterdata/  thin Water Data layer on the engine:
    utils.py                service->id map, stats API path, profile checks,
                            WATERDATA_DIALECT, and a get_ogc_data wrapper that
                            injects the Water Data defaults (re-exports engine
                            symbols so api.py/ratings.py are unchanged)
  dataretrieval/ngwmn.py    sibling module: get_sites, get_water_level,
                            get_lithology, get_well_construction, get_providers
                            — imports the engine from dataretrieval.ogc only

The engine is API-agnostic: `get_ogc_data(args, service, output_id, *,
base_url, extra_id_cols, dialect)`. An `OgcDialect(cql2_services,
date_only_services)` (threaded via a context variable, like the base-url
context) carries the per-API quirks — Water Data POSTs CQL2 for
monitoring-locations and renders `daily` time args date-only; NGWMN needs
neither. `ogc.engine` and `dataretrieval.ngwmn` both import with zero
`dataretrieval.waterdata` dependency.

NGWMN response-shape fixes in the engine (the NGWMN API differs from the main
one): key the empty-result short-circuit off `features` rather than the
`numberReturned` NGWMN omits; and tolerate observation features that carry no
`geometry` key.

PEP naming: the engine now snake_cases any non-snake column in finalize, so the
package always returns PEP-8 column names regardless of the upstream API
(a no-op today since both APIs are already snake_case, but enforced).

Tests: live NGWMN tests for all five getters (tests/ngwmn_test.py); a
`_to_snake_case` unit test; mock.patch sites repointed to ogc.engine; a
module-level fixture activates WATERDATA_DIALECT for the direct
_construct_api_requests unit tests. 285 unit tests pass; mypy --strict and
ruff clean. waterdata_test.py shows only the 3 known pre-existing live-API
drift failures (fixed by DOI-USGS#323), unrelated to this change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
thodson-usgs added a commit to thodson-usgs/dataretrieval-python that referenced this pull request Jun 12, 2026
Ports the NGWMN functions from the R dataRetrieval PR
(DOI-USGS/dataRetrieval#904) and, per review, refactors the Water Data OGC
machinery into a shared engine so NGWMN and Water Data are sibling layers on
top of it rather than NGWMN depending on Water Data.

Architecture
------------
  dataretrieval/ogc/        generic OGC engine (no API-specific config):
    chunking.py             (moved from waterdata/) the multi-value chunker
    filters.py              (moved) cql-text filter splitting
    progress.py             (moved from waterdata/_progress.py)
    engine.py               request build, paginate, parse, finalize, the
                            chunked get_ogc_data entry point, arg handling
  dataretrieval/waterdata/  thin Water Data layer on the engine:
    utils.py                service->id map, stats API path, profile checks,
                            WATERDATA_DIALECT, and a get_ogc_data wrapper that
                            injects the Water Data defaults (re-exports engine
                            symbols so api.py/ratings.py are unchanged)
  dataretrieval/ngwmn.py    sibling module: get_sites, get_water_level,
                            get_lithology, get_well_construction, get_providers
                            — imports the engine from dataretrieval.ogc only

The engine is API-agnostic: `get_ogc_data(args, service, output_id, *,
base_url, extra_id_cols, dialect)`. An `OgcDialect(cql2_services,
date_only_services)` (threaded via a context variable, like the base-url
context) carries the per-API quirks — Water Data POSTs CQL2 for
monitoring-locations and renders `daily` time args date-only; NGWMN needs
neither. `ogc.engine` and `dataretrieval.ngwmn` both import with zero
`dataretrieval.waterdata` dependency.

NGWMN response-shape fixes in the engine (the NGWMN API differs from the main
one): key the empty-result short-circuit off `features` rather than the
`numberReturned` NGWMN omits; and tolerate observation features that carry no
`geometry` key.

PEP naming: the engine now snake_cases any non-snake column in finalize, so the
package always returns PEP-8 column names regardless of the upstream API
(a no-op today since both APIs are already snake_case, but enforced).

Tests: live NGWMN tests for all five getters (tests/ngwmn_test.py); a
`_to_snake_case` unit test; mock.patch sites repointed to ogc.engine; a
module-level fixture activates WATERDATA_DIALECT for the direct
_construct_api_requests unit tests. 285 unit tests pass; mypy --strict and
ruff clean. waterdata_test.py shows only the 3 known pre-existing live-API
drift failures (fixed by DOI-USGS#323), unrelated to this change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant