Skip to content

feat: add NGWMN getters as an ogc sibling; extract a shared OGC engine#324

Draft
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:feat/ngwmn-ogc
Draft

feat: add NGWMN getters as an ogc sibling; extract a shared OGC engine#324
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:feat/ngwmn-ogc

Conversation

@thodson-usgs

@thodson-usgs thodson-usgs commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Ports the NGWMN functions from the R dataRetrieval PR (DOI-USGS/dataRetrieval#904) and, per review, refactors the Water Data OGC machinery into a shared engine so NGWMN and Water Data are sibling layers on top of it — NGWMN does not depend on Water Data.

Architecture

dataretrieval/ogc/        generic OGC engine (no API-specific config)
  chunking.py             multi-value chunker      (moved from waterdata/)
  filters.py              cql-text filter splitting (moved)
  progress.py             progress reporting        (moved from waterdata/_progress.py)
  engine.py               request build · paginate · parse · finalize · get_ogc_data
dataretrieval/waterdata/  thin Water Data layer on the engine
  utils.py                service→id map · stats API · WATERDATA_DIALECT · get_ogc_data wrapper
dataretrieval/ngwmn.py    sibling: get_sites/get_water_level/get_lithology/
                          get_well_construction/get_providers  (imports only dataretrieval.ogc)

The engine is API-agnostic: get_ogc_data(args, service, output_id, *, base_url, extra_id_cols, dialect). An OgcDialect(cql2_services, date_only_services) (threaded via a context variable, like the base-url context) carries per-API quirks — Water Data POSTs CQL2 for monitoring-locations and renders daily time args date-only; NGWMN needs neither. Both ogc.engine and dataretrieval.ngwmn import with zero dataretrieval.waterdata dependency.

from dataretrieval import ngwmn

df, md = ngwmn.get_sites(state_name="Wisconsin")
df, md = ngwmn.get_water_level(
    monitoring_location_id="USGS-272838082142201",
    datetime=["2022-01-01", "2024-01-01"],
)
df, md = ngwmn.get_water_level(            # NGWMN ids aren't all USGS-prefixed
    monitoring_location_id=["USGS-272838082142201", "MBMG-702934"]
)
df, md = ngwmn.get_providers(state="WI")

The multi-value chunker (recently fixed in #322) is generic and applies to NGWMN unchanged — verified that a forced-small-budget multi-site NGWMN query chunks and unions correctly.

Engine fixes (NGWMN's API differs from the main one)

  • Key the empty-result short-circuit off features rather than the numberReturned that NGWMN omits (otherwise pages with data were silently dropped).
  • Tolerate observation features with no geometry key (GeoDataFrame.from_features can't index a missing key).

PEP naming

The engine snake_cases any non-snake column in finalize, so the package always returns PEP-8 column names regardless of the upstream API — a no-op today (both APIs are already snake_case) but enforced going forward.

Tests

Live NGWMN tests for all five getters (tests/ngwmn_test.py); a _to_snake_case unit test; mock.patch sites repointed to ogc.engine; a module fixture activates WATERDATA_DIALECT for the direct _construct_api_requests unit tests. 285 unit tests pass, mypy --strict and ruff clean.

Note

CI will show 3 pre-existing failures (test_get_daily_properties/_id, test_get_continuous) — the live-API drift fixed by #323, not introduced here (branch is off main). They go green once #323 merges.

🤖 Generated with Claude Code

@thodson-usgs thodson-usgs changed the title feat(waterdata): add NGWMN OGC getters (sites, water level, lithology, construction, providers) feat: add NGWMN getters as an ogc sibling; extract a shared OGC engine Jun 12, 2026
Ports the NGWMN functions from the R dataRetrieval PR
(DOI-USGS/dataRetrieval#904) and, per review, refactors the Water Data OGC
machinery into a shared engine so NGWMN and Water Data are sibling layers on
top of it rather than NGWMN depending on Water Data.

Architecture
------------
  dataretrieval/ogc/        generic OGC engine (no API-specific config):
    chunking.py             (moved from waterdata/) the multi-value chunker
    filters.py              (moved) cql-text filter splitting
    progress.py             (moved from waterdata/_progress.py)
    engine.py               request build, paginate, parse, finalize, the
                            chunked get_ogc_data entry point, arg handling
  dataretrieval/waterdata/  thin Water Data layer on the engine:
    utils.py                service->id map, stats API path, profile checks,
                            WATERDATA_DIALECT, and a get_ogc_data wrapper that
                            injects the Water Data defaults (re-exports engine
                            symbols so api.py/ratings.py are unchanged)
  dataretrieval/ngwmn.py    sibling module: get_sites, get_water_level,
                            get_lithology, get_well_construction, get_providers
                            — imports the engine from dataretrieval.ogc only

The engine is API-agnostic: `get_ogc_data(args, service, output_id, *,
base_url, extra_id_cols, dialect)`. An `OgcDialect(cql2_services,
date_only_services)` (threaded via a context variable, like the base-url
context) carries the per-API quirks — Water Data POSTs CQL2 for
monitoring-locations and renders `daily` time args date-only; NGWMN needs
neither. `ogc.engine` and `dataretrieval.ngwmn` both import with zero
`dataretrieval.waterdata` dependency.

NGWMN response-shape fixes in the engine (the NGWMN API differs from the main
one): key the empty-result short-circuit off `features` rather than the
`numberReturned` NGWMN omits; and tolerate observation features that carry no
`geometry` key.

PEP naming: the engine now snake_cases any non-snake column in finalize, so the
package always returns PEP-8 column names regardless of the upstream API
(a no-op today since both APIs are already snake_case, but enforced).

Tests: live NGWMN tests for all five getters (tests/ngwmn_test.py); a
`_to_snake_case` unit test; mock.patch sites repointed to ogc.engine; a
module-level fixture activates WATERDATA_DIALECT for the direct
_construct_api_requests unit tests. 285 unit tests pass; mypy --strict and
ruff clean. waterdata_test.py shows only the 3 known pre-existing live-API
drift failures (fixed by DOI-USGS#323), unrelated to this change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant