Document sampling features live-server lessons

Sam-Bolling · Sam-Bolling · commit 7e45e0d83008 · 2026-06-04T14:41:27.000-04:00
diff --git a/docs/implementation/note-pygeoapi-sampling-features-system-filter-defect.md b/docs/implementation/note-pygeoapi-sampling-features-system-filter-defect.md
@@ -0,0 +1,84 @@
+# pygeoapi Sampling Features System-Filter Defect
+
+**Document date:** June 4, 2026
+
+**Purpose:** Record the remaining `pygeoapi/52North` defect observed after public `samplingFeatures` population was restored on the Oracle-hosted demo environment. This note is intentionally narrow. It documents the defect, the observed evidence, the likely query-path cause, and the acceptance target for a future code fix.
+
+## Summary
+
+The public `pygeoapi/52North` deployment now exposes populated top-level and item-level `samplingFeatures` data, but the nested system traversal path still fails:
+
+- `GET /samplingFeatures` returns populated results
+- `GET /samplingFeatures/{id}` returns populated item results
+- `GET /systems/{id}/samplingFeatures` returns an empty collection for seeded systems that do have associated sampling features
+
+This should be treated as a remaining query-path defect, not as a data-population gap.
+
+## Observed evidence
+
+As verified on June 4, 2026:
+
+- top-level public collection was populated after repair and bulk indexing
+- individual item reads worked for newly indexed public sampling features
+- nested `systems/{id}/samplingFeatures` remained empty for known seeded systems
+- top-level filtering by `system=<id>` also returned empty results for those same seeded systems
+
+These two failing paths strongly suggest that the nested traversal and the top-level `system` filter share the same broken filter logic.
+
+## Why this is not a data-absence problem
+
+The same deployment demonstrated all of the following at the same time:
+
+- sampling-feature documents existed in Elasticsearch
+- the documents were readable through the public top-level collection
+- individual item reads worked through the public API
+- the seeded system identifiers were known and stable
+
+That combination rules out the earlier "empty demo data" explanation for this specific route. The remaining issue is the query logic used to associate sampling features with their parent systems.
+
+## Likely cause
+
+The observed Elasticsearch mapping for the `sampling_features` index included:
+
+- `system` as a `text` field
+- `system.keyword` as an exact-match `keyword` subfield
+
+The server-side query path appears to filter against `system` rather than `system.keyword`.
+
+That is a bad fit for exact system-ID matching because:
+
+- the system IDs are hyphenated slugs
+- the `text` analyzer tokenizes those values
+- exact parent-system lookups should use the `keyword` subfield instead
+
+The nested route behavior and the top-level `?system=` behavior both match this diagnosis.
+
+## Probable fix direction
+
+The future code fix should keep the existing multi-value filter shape but route the filter to the exact-match field.
+
+In practical terms, the likely change is:
+
+- keep a `terms`-style filter if the API layer passes `system` as a list
+- target `system.keyword` rather than `system`
+
+This note intentionally does not prescribe a specific patch beyond that direction. The owning repo should implement and verify the fix in its own test suite and deployment process.
+
+## Acceptance target for the fix
+
+Treat the defect as fixed only when all of the following succeed for a seeded or otherwise known-associated system:
+
+1. `GET /samplingFeatures?system=<system-id>` returns non-empty results.
+2. `GET /systems/{id}/samplingFeatures` returns non-empty results.
+3. Results from both paths are consistent for the same system.
+4. Top-level `GET /samplingFeatures` still works.
+5. Direct `GET /samplingFeatures/{id}` item reads still work.
+6. Paging still works on both the filtered top-level path and the nested path.
+
+## Recommended follow-up
+
+- Open or update a `csapi-pygeoapi` issue with this note as the evidence summary.
+- Add an automated regression test for both:
+  - `GET /samplingFeatures?system=<id>`
+  - `GET /systems/{id}/samplingFeatures`
+- Keep this defect separate from demo-data stewardship work. The demo population issue was remediated; this remaining issue is code-path behavior.
diff --git a/docs/implementation/public-sampling-features-seeding-and-verification-runbook.md b/docs/implementation/public-sampling-features-seeding-and-verification-runbook.md
@@ -0,0 +1,169 @@
+# Public Sampling Features Seeding and Verification Runbook
+
+**Document date:** June 4, 2026
+
+**Purpose:** Provide a repeatable operational runbook for maintaining public `samplingFeatures` readiness across the Oracle-hosted CSAPI demo deployments. This runbook is based on the June 4, 2026 live remediation work and is intended to prevent future drift into empty, misleading, or partially readable public `samplingFeatures` surfaces.
+
+## Scope
+
+This runbook is for operational stewardship of public demo data. It is not a standards interpretation document and it is not a replacement for implementation-specific bug fixing.
+
+Use it when a public deployment:
+
+- exposes `samplingFeatures` routes but returns empty collections
+- accepts create operations without dependable readback
+- shows divergence between top-level, item-level, and nested traversal paths
+- needs a richer public demo surface for downstream integration testing
+
+## Readiness target
+
+Before a public deployment should be described as ready for positive `samplingFeatures` interoperability work, it should satisfy all of the following:
+
+1. `GET /samplingFeatures` returns `200` and a non-empty collection.
+2. `GET /samplingFeatures/{id}` works for at least one public item.
+3. `GET /systems/{id}/samplingFeatures` works for at least one system that should have associated sampling features.
+4. At least one returned sampling feature includes:
+   - geometry
+   - `uid`
+   - `name`
+   - `featureType`
+5. Post-write verification is possible where create is supported.
+
+## Seed-corpus pattern
+
+Prefer a contextual seed corpus over anonymous generated points.
+
+### Recommended seed families
+
+The June 4, 2026 remediation used these ten families:
+
+- desert weather
+- coastal buoy
+- river gauge
+- indoor thermometry
+- airport meteorology
+- estuary water quality
+- acoustic array
+- urban air monitoring
+- agricultural field monitoring
+- wildfire-edge monitoring
+
+### Recommended seed properties
+
+Each family should include:
+
+- one stable seed system
+- 100 or more spatially distributed sampling features
+- deterministic UIDs
+- deterministic item numbering
+- realistic names and descriptions
+- stable `featureType`
+- stable `sampledFeature@link`
+
+The goal is not random bulk volume. The goal is reusable, plausible, and inspectable public test data.
+
+## Verification sequence
+
+Run these checks in order.
+
+### 1. Collection presence
+
+Verify:
+
+- top-level `GET /samplingFeatures`
+- collection count or feature length
+- paging with `limit`
+
+This answers whether public data is present at all.
+
+### 2. Item readback
+
+Pick one known item ID from the public collection and verify:
+
+- `GET /samplingFeatures/{id}`
+
+This confirms that the collection is not exposing references to unreadable items.
+
+### 3. Nested traversal
+
+Pick one known parent system and verify:
+
+- `GET /systems/{id}/samplingFeatures`
+
+This must be checked separately. Do not assume that a working top-level collection implies a working nested traversal path.
+
+### 4. Filtered top-level behavior
+
+If the implementation supports filtering by parent system, verify:
+
+- `GET /samplingFeatures?system=<id>`
+
+This is particularly important because some stacks can expose working top-level collections while still breaking the parent-system filter path.
+
+### 5. Create-readback verification
+
+Where create is supported:
+
+1. create a known sampling feature under a known system
+2. record the returned ID or `Location`
+3. verify direct item readback
+4. verify top-level collection visibility
+5. verify nested collection visibility
+
+Do not treat the status code as sufficient evidence by itself.
+
+## Write-path guardrails
+
+The June 4, 2026 live work showed that write-path status codes can be misleading.
+
+- A create may persist even when the server returns an error.
+- A create may return success while the read path still cannot expose the new resource correctly.
+
+Because of that, every write-path check should be treated as a postcondition check, not a status-only check.
+
+## Backing-store repair boundary
+
+Some deployments may require backing-store repair rather than API-only remediation.
+
+Examples of when that boundary has been crossed:
+
+- public create returns success but readback remains broken
+- documents exist in the backing store but lack the representation branch expected by the API read path
+- top-level and item reads behave differently for the same resource family
+
+When this happens:
+
+- document the backing-store contract
+- repair documents in a controlled way
+- rerun the full readback sequence after repair
+
+Do not treat backing-store repair as a substitute for fixing a genuine API query-path bug.
+
+## Distinguish operational state from implementation defects
+
+Keep these categories separate:
+
+- **Operational/data-state issue:** route exists but is empty or thinly seeded
+- **Implementation defect:** route, filter, traversal, representation, or readback logic is wrong
+
+The same deployment can have both at once. The runbook should not blur them together.
+
+## Minimum reporting after a maintenance pass
+
+After any seeding or repair pass, record:
+
+- deployment URL
+- date and time
+- top-level collection count
+- item-read verification result
+- nested-traversal verification result
+- filtered top-level verification result, if applicable
+- whether create-readback was exercised
+- any remaining route-specific defects
+
+## Recommended follow-up integration
+
+- Add these checks to live public smoke-test workflows.
+- Keep the seed corpus versioned and idempotent.
+- Preserve stable batch identifiers so cleanup and refresh remain possible.
+- Re-run this checklist after deployment rebuilds, proxy changes, or data refreshes.
diff --git a/docs/research/testing/findings/39-live-sampling-feature-population-and-readback-lessons.md b/docs/research/testing/findings/39-live-sampling-feature-population-and-readback-lessons.md