feat(dataset push): object_detection + keypoint_detection#15
Conversation
|
👋 Heads-up — Code review queue is at 20 / 8 Above the WIP limit. The team convention is to review existing PRs before opening new work. Open PRs currently in Code review (oldest first):
Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.) |
aptracebloc
left a comment
There was a problem hiding this comment.
Approve ✅ — object_detection + keypoint + schema re-sync
Brings the CLI to 9/10 modalities and resolves the schema-drift on the rest of the stack (this is the PR that re-syncs internal/schema/ingest.v1.json → Schema drift check is green here, and develop is clean once this lands).
Verified:
- object_detection:
annotations/(Pascal VOC.xml) packaged via the shared sidecar walker — 9 files staged (4 img + 4 xml + labels.csv); reaches the ingestor's Pascal-VOC validator on a live run. - keypoint_detection: spec synthesis validates against the re-synced schema (dry-run) with top-level
target_size+number_of_keypoints(the keypoint conditional); missing--number-of-keypoints→ exit 2. - Embedded schema re-sync matches data-ingestors master (keypoint/MLM fields present).
Known (deployment, not CLI): keypoint can't ingest on the current deployed jobs-manager (older embedded schema rejects the top-level fields) — same ops follow-up as #14's MLM. The CLI's emission is correct for a current deployment.
Merge last in the stack (#12→#13→#14→#15); this one turns develop fully green.
|
👋 Heads-up — Code review queue is at 26 / 8 Above the WIP limit. The team convention is to review existing PRs before opening new work. Open PRs currently in Code review (oldest first):
Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.) |
…e-sync)
Adds the two remaining engine-supported image categories, taking the CLI
to 9/10 modalities (only semantic_segmentation remains, blocked on the
ingestor — data-ingestors#136).
- object_detection: reuses the generic sidecar walker for annotations/
(.xml). Validated live end-to-end — 128 records (bounding boxes)
ingested, rows confirmed in MySQL.
- keypoint_detection: labels.csv + images/ (keypoint coords live in the
CSV's Annotation column, read server-side). Adds --number-of-keypoints
(required; no default). Emits target_size + number_of_keypoints as
TOP-LEVEL fields, which the schema's keypoint conditional requires.
- Re-synced the embedded schema from data-ingestors develop. The vendored
copy was stale: it lacked keypoint's top-level target_size +
number_of_keypoints and their required-for-keypoint conditional, so the
CLI couldn't validate a keypoint spec at all. `ingest validate` and
dataset push now validate keypoint correctly.
Schema-skew findings (deployment/release hygiene, NOT CLI bugs):
* sync-schema.sh defaults to data-ingestors *master*, which is stale
(lacks both MLM and keypoint); the current schema is on *develop*.
Repoint the sync source to develop, or promote develop -> master.
(sync --check vs master flags this drift — pre-existing, surfaced here.)
* The deployed ingdemo client runs jobs-manager and the ingestor on
DIFFERENT schema versions: the ingestor (newer) REQUIRES keypoint's
top-level fields; jobs-manager (older) REJECTS them as additional
properties. So keypoint can't be ingested there until both components
are refreshed to a matching schema. The CLI's emission is correct
against the current/consistent schema (unit-verified). OD is
unaffected (no new fields).
Tests: push/image_extras_test.go (DiscoverObjectDetection +
missing-annotations); spec_test.go (OD emits annotations; keypoint emits
top-level target_size + number_of_keypoints; both pass the schema);
updated the unsupported-category gate test (now segmentation only).
go build / vet / test green.
Stacked on cli#14 (text) -> #13 (tabular) -> #12 (fixes).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
56c9d8d to
3595665
Compare
Summary
Adds object_detection and keypoint_detection — the CLI now covers 9 of 10 modalities (only
semantic_segmentationremains, blocked on data-ingestors#136). OD is live-validated (128 records, rows in MySQL); keypoint is code-complete + correct against the current schema, live-blocked by a deployment schema skew (below).What's added
annotations/(Pascal VOC.xml). Live: staged images + annotations → Pascal-VOC + resolution validators passed → 128 bounding-box records ingested, confirmed in MySQL.labels.csv+images/(keypoint coords are in the CSV'sAnnotationcolumn, read server-side). New--number-of-keypoints(required). Emitstarget_size+number_of_keypointstop-level — the schema's keypoint conditional requires them there.internal/schema/ingest.v1.json): the vendored copy was stale — missing keypoint's top-level fields + the conditional, so the CLI couldn't validate a keypoint spec at all. Re-synced from data-ingestorsdevelop.sync-schema.shdefaults to data-ingestors master, which lacks both MLM and keypoint; the current schema is on develop. → Repoint the sync source to develop, or promote develop→master.sync --checkvs master flags this (pre-existing drift, surfaced here).file_options). The CLI's emission is correct for a consistent/current deployment.Test plan
go build/vet/test ./...green;push/image_extras_test.go(OD discover + missing-annotations);spec_test.go(OD emits annotations; keypoint emits top-leveltarget_size+number_of_keypoints; both pass the schema).dataset push ./od --category object_detection --label-column image_label→ 128 records, 100%, rows intraining_test_datasets.clidemo_od3_train.Matrix after this PR
✅ live: image_classification, object_detection, tabular_classification, tabular_regression, time_series_forecasting, time_to_event_prediction, text_classification
✅ code-complete (live-blocked by deployment skew): keypoint_detection, masked_language_modeling
⏸ semantic_segmentation — blocked on data-ingestors#136
🤖 Generated with Claude Code