Skip to content

Conversation

@captaincoordinates
Copy link
Collaborator

@captaincoordinates captaincoordinates commented Nov 22, 2025

Closes #196

Adds the ability for the indexer to store the source STAC JSON in the index for both items and collections. API determines at runtime whether STAC can be sourced from the index or must be fetched from source.

Additional this change fixes a number of type compatibility issues related to different representations of Items in stac_pydantic and stac_fastapi.

Consolidates all STAC Item and Collection creation from source data for the API in a single location whereas this was previously spread across both core.py and search_handler.py.

Defaults running with remote sources to persisting STAC in the index, hoping to improve stability with potentially unreliable third-party data sources.

All integration tests now execute with both persisted and non-persisted STAC.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds the capability to persist STAC JSON content directly in the index for both items and collections, reducing reliance on potentially unreliable third-party data sources. The API now determines at runtime whether to fetch STAC content from the index (when persisted) or from the original source location. This refactoring also consolidates all STAC retrieval logic into a new centralized fetcher.py module and resolves type compatibility issues between stac_pydantic and stac_fastapi representations. By default, remote source deployments now persist STAC content for improved stability, and all integration tests execute against both persisted and non-persisted configurations.

  • Added new database columns (item_content, collection_content) to store optional JSON representations
  • Consolidated STAC fetching logic from core.py and search_handler.py into new stac/fetcher.py module
  • Updated type definitions to use composition instead of inheritance for better compatibility

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/stac_fastapi/indexed/stac/fetcher.py New centralized module for all STAC item and collection retrieval with support for both persisted and fetched content
src/stac_fastapi/indexed/core.py Refactored to delegate STAC retrieval to fetcher module, removing duplicated fetch logic
src/stac_fastapi/indexed/search/search_handler.py Updated to use centralized item fetching from query rows
packages/stac-index/src/stac_index/indexer/creator/creator.py Enhanced to persist STAC JSON content when configured, with support for null geometries
packages/stac-index/src/stac_index/indexer/stac_parser.py Modified to return dictionaries and fix names instead of Item objects
packages/stac-index/src/stac_index/indexer/types/stac_data.py Refactored type definitions to use composition pattern for better type compatibility
packages/stac-index/src/stac_index/indexer/types/index_config.py Added persist_stac_content configuration option
packages/stac-index/src/stac_index/indexer/creator/sql/01-tables/001-collections.sql Added collection_content JSON column
packages/stac-index/src/stac_index/indexer/creator/sql/01-tables/002-items.sql Added item_content JSON column
tests/test_fetcher.py New comprehensive test suite for fetcher module with both persisted and fetch scenarios
tests/test_search_handler.py Updated tests to mock new centralized fetching function
tests/test_core.py Removed (tests consolidated into test_fetcher.py)
tests/with_environment/integration_tests/*.py Updated JSON comparison to use sort_keys=True for deterministic equality checks
scripts/tests/integration-test.sh Enhanced to run all integration tests with both persisted and non-persisted configurations
scripts/run-with-remote-source.sh Updated to enable STAC persistence by default for remote sources
docker-compose.*.yml Added environment variable support for selecting index configuration
data/STAC/sample/index-config-with-persistence.json New configuration file for testing with persisted STAC content
docs/index-config.md Added documentation for persist_stac_content option
docs/suitability.md Updated to mention STAC persistence as mitigation for index staleness
README.md Added section explaining persisted STAC content for remote sources

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

) -> Collection | None:
if row.collection_content is None:
try:
collection = Collection(**await _fetch_dict(row[1]))
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent named tuple access. The code uses row[1] to access the stac_location field, but elsewhere in the same function (line 165) it correctly uses row.stac_location. Since row is defined as a CollectionQueryRow NamedTuple with named fields, the code should consistently use the field name.

Replace row[1] with row.stac_location for consistency and clarity.

Suggested change
collection = Collection(**await _fetch_dict(row[1]))
collection = Collection(**await _fetch_dict(row.stac_location))

Copilot uses AI. Check for mistakes.

#### Persisted STAC Content

The remote source script is configured to persist STAC content in the index. See [enabling STAC persistence](./index-config.md#persist-stac-content) for more information on this subject. If you do not want STAC JSON persisted in the index for remote sources you should alter this configuration.
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation link is incorrect. The link [enabling STAC persistence](./index-config.md#persist-stac-content) should be [enabling STAC persistence](./docs/index-config.md#persist-stac-content) to correctly point to the documentation file in the docs/ directory.

Suggested change
The remote source script is configured to persist STAC content in the index. See [enabling STAC persistence](./index-config.md#persist-stac-content) for more information on this subject. If you do not want STAC JSON persisted in the index for remote sources you should alter this configuration.
The remote source script is configured to persist STAC content in the index. See [enabling STAC persistence](./docs/index-config.md#persist-stac-content) for more information on this subject. If you do not want STAC JSON persisted in the index for remote sources you should alter this configuration.

Copilot uses AI. Check for mistakes.
Comment on lines +339 to +341
geometry_wkt_constructor=f"ST_GeomFromText('{item_with_location_and_fixes.item.geometry.wkt}')"
if item_with_location_and_fixes.item.geometry
else "NULL"
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema constraint violation: The code now handles null geometries by inserting NULL into the database (line 341), but the schema in 002-items.sql defines the geometry column as GEOMETRY NOT NULL. This will cause a constraint violation at runtime when attempting to insert items without geometry.

Either the schema should be updated to allow NULL geometries (geometry GEOMETRY NULL), or items without geometry should be rejected during validation before reaching the insert statement.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support optional persistence of STAC JSON

2 participants