-
Notifications
You must be signed in to change notification settings - Fork 17
Bulk data access server #2482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Bulk data access server #2482
Conversation
|
Demonstration: In the future, additional technologies such as databricks may be used. Also authentication and authorization may be performed through the web client to the pathling-server. use the auth in the createTag method, but unsure what the effects are. Is there something "in" the auth object that stays the same across requests so caching still works? Anyhow, some auth information should maybe be part of the tag (if parts stay the same across requests) |
|
@fhnaumann Could you please merge main into this branch? |
|
Why are there some deleted files from the test data directory in the library API? There are other tests that rely upon this data. |
johngrimes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't actually compile for me yet, and not passing the tests on CI.
I've added some preliminary comments anyway, I can take another look once we have a green build.
As a general comment please also take a look at the CONTRIBUTING.md file and make sure that everything is ticked off there.
library-api/src/main/java/au/csiro/pathling/library/io/sink/NdjsonSink.java
Outdated
Show resolved
Hide resolved
library-api/src/main/java/au/csiro/pathling/library/io/sink/DataSink.java
Outdated
Show resolved
Hide resolved
library-api/src/main/java/au/csiro/pathling/library/io/sink/DataSink.java
Show resolved
Hide resolved
library-api/src/main/java/au/csiro/pathling/library/io/source/QueryableDataSource.java
Outdated
Show resolved
Hide resolved
library-api/src/main/java/au/csiro/pathling/library/io/source/QueryableDataSource.java
Show resolved
Hide resolved
library-api/src/main/java/au/csiro/pathling/library/io/source/TransformChain.java
Outdated
Show resolved
Hide resolved
| @Component | ||
| @Profile("server") | ||
| @Slf4j | ||
| public class ConformanceProvider implements IServerConformanceProvider<CapabilityStatement>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be updated to accurately reflect the capabilities of the server now.
pathling-server/src/main/java/au/csiro/pathling/FhirServer.java
Outdated
Show resolved
Hide resolved
pathling-server/src/main/java/au/csiro/pathling/FhirServer.java
Outdated
Show resolved
Hide resolved
|
Re-implementing the $import operation requires some changes to the API:
Change the $import parameters:
Align the $import to the bulk-import manifest:
|
0014bd8 to
14af1df
Compare
Added @ToString.Exclude annotations to privateKeyJwk and clientSecret fields in PnpConfiguration to prevent these sensitive values from appearing in log output.
Changed test-server workflow to run 'mvn verify' which executes both unit tests (via surefire) and integration tests (via failsafe). This ensures that ImportPnpOperationIT and other integration tests are executed and their coverage is reported to SonarCloud. Also increased timeout from 20 to 30 minutes to accommodate integration test execution time.
Created import-pnp.OperationDefinition.json with documentation for the ping and pull bulk import operation, including parameters for exportUrl, inputSource, exportType, and saveMode. Updated ConformanceProvider to advertise both $import and $import-pnp operations in the CapabilityStatement so clients can discover and invoke these operations.
…entation Removed search, aggregate, and extract OperationDefinition resources which are no longer advertised in the CapabilityStatement. Updated import.OperationDefinition.json to: - Mention all supported data formats (NDJSON, Parquet, Delta) - Document all supported save modes (error, overwrite, append, ignore, merge) - Clarify default values for mode and format parameters
Restructured import.OperationDefinition.json to correctly reflect: - inputSource parameter (required string) for SMART spec tracking - input parameter array with type and url sub-parts (matching actual implementation) - inputFormat parameter supporting both MIME types and simple codes - mode parameter as Pathling extension with correct defaults Now aligns with both SMART Bulk Data Import specification and the actual ImportRequest/ImportOperationValidator implementation.
The inputFormat parameter documentation now emphasises MIME types as the primary format per the SMART Bulk Data Import specification, while noting that simple codes are also accepted for convenience.
Updated the import operation implementation to prioritise MIME types (e.g. application/fhir+ndjson) over simple codes (e.g. ndjson) per the SMART Bulk Data Import specification. Simple codes are still supported for backwards compatibility. Updated tests to use MIME types as the primary format in test cases.
…bility Removed support for simple codes (e.g. "ndjson", "parquet", "delta") in favour of strict MIME type usage (e.g. "application/fhir+ndjson") as specified by the SMART Bulk Data Import specification. This ensures proper interoperability with other SMART-compliant systems and prevents ambiguity in format specification. Updated OperationDefinition, implementation, and tests to only accept MIME types, with clearer error messages when unsupported formats are provided.
Updated MIME types to use vendor-specific formats to properly identify Pathling-specific data formats: - application/x-pathling-parquet for Parquet files - application/x-pathling-delta+parquet for Delta Lake files This follows the standard convention for vendor-specific MIME types using the 'x-' prefix and clearly identifies these as Pathling extensions to the SMART Bulk Data Import specification.
Changed ImportFormat enum to use MIME types directly as codes instead of simple strings, eliminating the need for mapping logic in validators. This simplifies the architecture by: - Making ImportFormat.fromCode() work directly with MIME types - Removing duplicate MIME type mapping logic from validators - Ensuring consistency between the enum definition and validation The enum now stores: - NDJSON: application/fhir+ndjson - PARQUET: application/x-pathling-parquet - DELTA: application/x-pathling-delta+parquet Both ImportOperationValidator and ImportPnpOperationValidator now simply delegate to ImportFormat.fromCode() for parsing.
Adds comprehensive E2E tests covering: - Form initialisation and loading states - Standard URL import form interactions - Import execution, progress tracking, and completion - Error handling and job cancellation - PnP (ping-and-pull) import from FHIR server - Tab behaviour during active imports Key implementation note: Mock responses must include Access-Control-Expose-Headers for custom headers like Content-Location and X-Progress to be readable by the browser.
Aligns server validators with OperationDefinition specifications which define parameters as 'code' type rather than 'Coding'. This fixes a ClassCastException when the UI sends parameters as valueCode. Changes: - ParamUtil: Catch ClassCastException and return 400 Bad Request - ImportPnpOperationValidator: Use CodeType for exportType, saveMode, inputFormat - ImportOperationValidator: Use CodeType for resourceType, inputFormat, saveMode - Add inputFormat parameter to import-pnp OperationDefinition - UI: Send all parameters as valueCode instead of valueCoding - Add comprehensive unit tests for validators
Aligns test with recent change to ImportOperationValidator that expects CodeType instead of Coding for the resourceType parameter in import operation requests.
Adds 25 E2E tests covering form functionality, export execution for all four export types (system, patient type, patient instance, group), progress display, file download, cancellation, and error handling.
Adds comprehensive E2E tests covering initialisation, authentication, stored view definitions, custom JSON input, and results display. Includes mock data for ViewDefinition resources and view run responses.
Tests login prompts, OAuth flow initiation, callback error handling, and server capability detection. Focuses on behaviours that can be reliably tested via API mocking without a real OAuth provider.
…ownership checks Export operations ($export, view-export, bulk-submit) now return FHIR Parameters resources directly. A new ParametersToJsonInterceptor converts responses to JSON format when clients specify Accept: application/json. Removed redundant ownership validation from ImportProvider and ExportOperationHelper. These checks compared the current user to the job owner, but since AsyncAspect preserves the same authentication context, the values were always equal. Ownership validation now occurs at the correct layer: JobProvider.$job for status polling and ExportResultProvider.$result for file downloads. Added JobProviderSecurityTest to verify ownership enforcement at the $job endpoint.
The server now returns Parameters resources (converted to plain JSON) instead of Binary-wrapped responses for export status operations. Removed the now-unused Binary detection and base64 decoding logic from bulkExportStatus, jobStatus, and bulkSubmitStatus functions.
Test assertions used incorrect FHIR parameter types and names: - Changed valueCoding to valueCode for simple code values - Changed parameter name from "mode" to "saveMode"
Align with server-side changes that now return FHIR Parameters resources from export operations. Added getExportOutputFiles() helper function to extract output file entries from the Parameters structure. Fixed incorrect nesting in e2e test fixture.
Updates the API endpoint used for job status polling and cancellation across the UI codebase, including tests and E2E specs.
Create ViewExecutionHelper to eliminate code duplication between ViewDefinitionRunProvider and ViewDefinitionInstanceRunProvider. Both providers now delegate view execution to the shared helper.
Consolidates the extractPatientIdsFromGroup method that was duplicated across ViewExecutionHelper, ViewDefinitionExportProvider, and GroupExportProvider into a new shared GroupMemberService.
Adds a new JSON output format that returns query results as a single JSON document containing an array of objects. Each object matches the format of individual NDJSON lines. Unlike NDJSON, JSON cannot be streamed row-by-row so results are collected and written at once.
Replace "Select all" and "Clear" buttons with just "Clear", and add inline note explaining that leaving none selected exports all types.
Extracted common job cancellation patterns into helper methods to eliminate duplicate log messages and improve maintainability. Also fixed incorrect log message in onStageCompleted that said "Stage submitted" instead of "Stage completed".
Users were always redirected to the Dashboard after login instead of the page they were on when they clicked the login button. This was caused by React StrictMode's double-mount behaviour clearing the return URL from sessionStorage before it could be used. Split getAndClearReturnUrl into separate getReturnUrl and clearReturnUrl functions so the URL isn't lost during StrictMode's effect re-execution. Added E2E test that properly mocks fhirclient's sessionStorage state.
Cache invalidation was slow because it scanned all Delta tables to compute a new cache key. Added an invalidate(tablePath) overload that queries only the modified table's timestamp, reducing the operation from O(n) to O(1) for single-resource mutations. Update and delete operations now use the optimised method, while bulk import continues to scan all tables since it modifies multiple resource types.
Add Javadoc to constructor, add final modifiers to parameters, add @nonnull annotation to Cacheable interface method, and remove unused getTableUrl method.
Replace null executor arguments with a no-op ThreadPoolTaskExecutor to satisfy the @nonnull annotation on the CacheableDatabase constructor. Also add missing final modifiers and chain assertions.
Adds pathling.operations.* configuration properties to enable/disable individual server operations. All operations are enabled by default. Disabled operations return a client error and are excluded from the CapabilityStatement. Configurable operations include CRUD (create, read, update, delete, search), batch, export (system/patient/group), import, import-pnp, viewdefinition-run (system/instance), viewdefinition-export, and bulk-submit.
The _format parameter remains optional and overrides the Accept header when provided. Adds fromAcceptHeader() method to ViewOutputFormat with quality value support. Refactors tests to use parameterized tests and improved AssertJ assertions.
Adds a comprehensive test that verifies ViewDefinition resources can be parsed from JSON, encoded to Spark InternalRow, decoded back to ViewDefinitionResource, and serialised to JSON that matches the original. Uses 12 real-world ViewDefinition examples from: - FHIR SQL on FHIR IG (6 views) - sql-on-fhir-evaluation repository (6 views) The test currently fails, revealing missing support for: - fhirVersion field - valueCode constant values Adds JSONAssert as a test dependency for JSON comparison.
Adds support for the fhirVersion field (array of codes) to ViewDefinitionResource to enable round-trip encoding of ViewDefinitions that specify FHIR version compatibility. Expands the constant.value[x] choice type to support all types defined in the SQL on FHIR specification: Base64Binary, Boolean, Canonical, Code, Date, DateTime, Decimal, Id, Instant, Integer, Oid, PositiveInt, String, Time, UnsignedInt, Uri, Url, and Uuid. Adds ConstantTypes.json test fixture to verify round-trip encoding of the supported constant value types.
The server module was using unshaded Gson while ConstantDeclarationTypeAdapterFactory (bundled in library-runtime) uses shaded Gson, causing a type mismatch when registering the adapter. This prevented valueCode, valueString, and other constant value types from being parsed in ViewDefinitions. Removed the direct Gson dependency and switched all server code to use the shaded Gson from library-runtime.
Renamed the branch to reflect the planned changes more accurately.
Relates to #2467, #1987 and #1986.
In the future, this PR may also include #2476 and #1988