-
Notifications
You must be signed in to change notification settings - Fork 1
196 persist stac #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
196 persist stac #197
Conversation
5e434e6 to
e5bea07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds the capability to persist STAC JSON content directly in the index for both items and collections, reducing reliance on potentially unreliable third-party data sources. The API now determines at runtime whether to fetch STAC content from the index (when persisted) or from the original source location. This refactoring also consolidates all STAC retrieval logic into a new centralized fetcher.py module and resolves type compatibility issues between stac_pydantic and stac_fastapi representations. By default, remote source deployments now persist STAC content for improved stability, and all integration tests execute against both persisted and non-persisted configurations.
- Added new database columns (
item_content,collection_content) to store optional JSON representations - Consolidated STAC fetching logic from
core.pyandsearch_handler.pyinto newstac/fetcher.pymodule - Updated type definitions to use composition instead of inheritance for better compatibility
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/stac_fastapi/indexed/stac/fetcher.py | New centralized module for all STAC item and collection retrieval with support for both persisted and fetched content |
| src/stac_fastapi/indexed/core.py | Refactored to delegate STAC retrieval to fetcher module, removing duplicated fetch logic |
| src/stac_fastapi/indexed/search/search_handler.py | Updated to use centralized item fetching from query rows |
| packages/stac-index/src/stac_index/indexer/creator/creator.py | Enhanced to persist STAC JSON content when configured, with support for null geometries |
| packages/stac-index/src/stac_index/indexer/stac_parser.py | Modified to return dictionaries and fix names instead of Item objects |
| packages/stac-index/src/stac_index/indexer/types/stac_data.py | Refactored type definitions to use composition pattern for better type compatibility |
| packages/stac-index/src/stac_index/indexer/types/index_config.py | Added persist_stac_content configuration option |
| packages/stac-index/src/stac_index/indexer/creator/sql/01-tables/001-collections.sql | Added collection_content JSON column |
| packages/stac-index/src/stac_index/indexer/creator/sql/01-tables/002-items.sql | Added item_content JSON column |
| tests/test_fetcher.py | New comprehensive test suite for fetcher module with both persisted and fetch scenarios |
| tests/test_search_handler.py | Updated tests to mock new centralized fetching function |
| tests/test_core.py | Removed (tests consolidated into test_fetcher.py) |
| tests/with_environment/integration_tests/*.py | Updated JSON comparison to use sort_keys=True for deterministic equality checks |
| scripts/tests/integration-test.sh | Enhanced to run all integration tests with both persisted and non-persisted configurations |
| scripts/run-with-remote-source.sh | Updated to enable STAC persistence by default for remote sources |
| docker-compose.*.yml | Added environment variable support for selecting index configuration |
| data/STAC/sample/index-config-with-persistence.json | New configuration file for testing with persisted STAC content |
| docs/index-config.md | Added documentation for persist_stac_content option |
| docs/suitability.md | Updated to mention STAC persistence as mitigation for index staleness |
| README.md | Added section explaining persisted STAC content for remote sources |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) -> Collection | None: | ||
| if row.collection_content is None: | ||
| try: | ||
| collection = Collection(**await _fetch_dict(row[1])) |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent named tuple access. The code uses row[1] to access the stac_location field, but elsewhere in the same function (line 165) it correctly uses row.stac_location. Since row is defined as a CollectionQueryRow NamedTuple with named fields, the code should consistently use the field name.
Replace row[1] with row.stac_location for consistency and clarity.
| collection = Collection(**await _fetch_dict(row[1])) | |
| collection = Collection(**await _fetch_dict(row.stac_location)) |
|
|
||
| #### Persisted STAC Content | ||
|
|
||
| The remote source script is configured to persist STAC content in the index. See [enabling STAC persistence](./index-config.md#persist-stac-content) for more information on this subject. If you do not want STAC JSON persisted in the index for remote sources you should alter this configuration. |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation link is incorrect. The link [enabling STAC persistence](./index-config.md#persist-stac-content) should be [enabling STAC persistence](./docs/index-config.md#persist-stac-content) to correctly point to the documentation file in the docs/ directory.
| The remote source script is configured to persist STAC content in the index. See [enabling STAC persistence](./index-config.md#persist-stac-content) for more information on this subject. If you do not want STAC JSON persisted in the index for remote sources you should alter this configuration. | |
| The remote source script is configured to persist STAC content in the index. See [enabling STAC persistence](./docs/index-config.md#persist-stac-content) for more information on this subject. If you do not want STAC JSON persisted in the index for remote sources you should alter this configuration. |
| geometry_wkt_constructor=f"ST_GeomFromText('{item_with_location_and_fixes.item.geometry.wkt}')" | ||
| if item_with_location_and_fixes.item.geometry | ||
| else "NULL" |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema constraint violation: The code now handles null geometries by inserting NULL into the database (line 341), but the schema in 002-items.sql defines the geometry column as GEOMETRY NOT NULL. This will cause a constraint violation at runtime when attempting to insert items without geometry.
Either the schema should be updated to allow NULL geometries (geometry GEOMETRY NULL), or items without geometry should be rejected during validation before reaching the insert statement.
Closes #196
Adds the ability for the indexer to store the source STAC JSON in the index for both items and collections. API determines at runtime whether STAC can be sourced from the index or must be fetched from source.
Additional this change fixes a number of type compatibility issues related to different representations of Items in stac_pydantic and stac_fastapi.
Consolidates all STAC Item and Collection creation from source data for the API in a single location whereas this was previously spread across both core.py and search_handler.py.
Defaults running with remote sources to persisting STAC in the index, hoping to improve stability with potentially unreliable third-party data sources.
All integration tests now execute with both persisted and non-persisted STAC.