Skip to content

Conversation

@fhnaumann
Copy link
Collaborator

@fhnaumann fhnaumann commented Sep 10, 2025

Renamed the branch to reflect the planned changes more accurately.

Relates to #2467, #1987 and #1986.
In the future, this PR may also include #2476 and #1988

@github-project-automation github-project-automation bot moved this to Backlog in Pathling Sep 15, 2025
@johngrimes johngrimes moved this from Backlog to In progress in Pathling Sep 15, 2025
@johngrimes johngrimes added fhirpath Related to fhirpath reference implementation new feature New feature or request labels Sep 15, 2025
@fhnaumann
Copy link
Collaborator Author

fhnaumann commented Sep 16, 2025

Demonstration:
An AidBox server is deployed to Kubernetes. A small script pulls data from it (using bulk export?) and converts the ndjson files to parquet files (delta tables). The developed pathling-server uses that data as input and it can be requested to perform bulk export on it. Requests are made through a web client. Every component is dockerized, packaged using helm charts and then deployed to Kubernetes.

In the future, additional technologies such as databricks may be used. Also authentication and authorization may be performed through the web client to the pathling-server.

use the auth in the createTag method, but unsure what the effects are. Is there something "in" the auth object that stays the same across requests so caching still works? Anyhow, some auth information should maybe be part of the tag (if parts stay the same across requests)

@johngrimes johngrimes self-requested a review September 18, 2025 01:39
@johngrimes
Copy link
Member

@fhnaumann Could you please merge main into this branch?

@johngrimes
Copy link
Member

Why are there some deleted files from the test data directory in the library API? There are other tests that rely upon this data.

Copy link
Member

@johngrimes johngrimes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't actually compile for me yet, and not passing the tests on CI.

I've added some preliminary comments anyway, I can take another look once we have a green build.

As a general comment please also take a look at the CONTRIBUTING.md file and make sure that everything is ticked off there.

@Component
@Profile("server")
@Slf4j
public class ConformanceProvider implements IServerConformanceProvider<CapabilityStatement>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be updated to accurately reflect the capabilities of the server now.

@fhnaumann
Copy link
Collaborator Author

fhnaumann commented Oct 15, 2025

Re-implementing the $import operation requires some changes to the API:

  • DataSourceBuilder:
    • Add the ability to load a list of filepaths instead of just an entire directory. Proposed change: Overload the current methods which take in a directory path with new methods which take in Collection<String> filepaths

Change the $import parameters:

  • rename source to input
  • move mode and format to top-level

Align the $import to the bulk-import manifest:

  • only support mandatory fields (except for inputSource)

@johngrimes johngrimes moved this from In progress to Planned in Pathling Oct 26, 2025
@johngrimes johngrimes moved this from Planned to In progress in Pathling Nov 3, 2025
@johngrimes johngrimes assigned johngrimes and unassigned fhnaumann Nov 3, 2025
@johngrimes johngrimes changed the title Pathling 8 server Bulk data access server Nov 3, 2025
Added @ToString.Exclude annotations to privateKeyJwk and clientSecret
fields in PnpConfiguration to prevent these sensitive values from
appearing in log output.
Changed test-server workflow to run 'mvn verify' which executes both
unit tests (via surefire) and integration tests (via failsafe). This
ensures that ImportPnpOperationIT and other integration tests are
executed and their coverage is reported to SonarCloud.

Also increased timeout from 20 to 30 minutes to accommodate integration
test execution time.
Created import-pnp.OperationDefinition.json with documentation for the
ping and pull bulk import operation, including parameters for exportUrl,
inputSource, exportType, and saveMode.

Updated ConformanceProvider to advertise both $import and $import-pnp
operations in the CapabilityStatement so clients can discover and invoke
these operations.
…entation

Removed search, aggregate, and extract OperationDefinition resources
which are no longer advertised in the CapabilityStatement.

Updated import.OperationDefinition.json to:
- Mention all supported data formats (NDJSON, Parquet, Delta)
- Document all supported save modes (error, overwrite, append, ignore, merge)
- Clarify default values for mode and format parameters
Restructured import.OperationDefinition.json to correctly reflect:
- inputSource parameter (required string) for SMART spec tracking
- input parameter array with type and url sub-parts (matching actual implementation)
- inputFormat parameter supporting both MIME types and simple codes
- mode parameter as Pathling extension with correct defaults

Now aligns with both SMART Bulk Data Import specification and the
actual ImportRequest/ImportOperationValidator implementation.
The inputFormat parameter documentation now emphasises MIME types
as the primary format per the SMART Bulk Data Import specification,
while noting that simple codes are also accepted for convenience.
Updated the import operation implementation to prioritise MIME types
(e.g. application/fhir+ndjson) over simple codes (e.g. ndjson) per the
SMART Bulk Data Import specification. Simple codes are still supported
for backwards compatibility.

Updated tests to use MIME types as the primary format in test cases.
…bility

Removed support for simple codes (e.g. "ndjson", "parquet", "delta")
in favour of strict MIME type usage (e.g. "application/fhir+ndjson")
as specified by the SMART Bulk Data Import specification.

This ensures proper interoperability with other SMART-compliant systems
and prevents ambiguity in format specification.

Updated OperationDefinition, implementation, and tests to only accept
MIME types, with clearer error messages when unsupported formats are
provided.
Updated MIME types to use vendor-specific formats to properly identify
Pathling-specific data formats:
- application/x-pathling-parquet for Parquet files
- application/x-pathling-delta+parquet for Delta Lake files

This follows the standard convention for vendor-specific MIME types
using the 'x-' prefix and clearly identifies these as Pathling
extensions to the SMART Bulk Data Import specification.
Changed ImportFormat enum to use MIME types directly as codes instead
of simple strings, eliminating the need for mapping logic in validators.

This simplifies the architecture by:
- Making ImportFormat.fromCode() work directly with MIME types
- Removing duplicate MIME type mapping logic from validators
- Ensuring consistency between the enum definition and validation

The enum now stores:
- NDJSON: application/fhir+ndjson
- PARQUET: application/x-pathling-parquet
- DELTA: application/x-pathling-delta+parquet

Both ImportOperationValidator and ImportPnpOperationValidator now
simply delegate to ImportFormat.fromCode() for parsing.
Adds comprehensive E2E tests covering:
- Form initialisation and loading states
- Standard URL import form interactions
- Import execution, progress tracking, and completion
- Error handling and job cancellation
- PnP (ping-and-pull) import from FHIR server
- Tab behaviour during active imports

Key implementation note: Mock responses must include
Access-Control-Expose-Headers for custom headers like
Content-Location and X-Progress to be readable by the browser.
Aligns server validators with OperationDefinition specifications which
define parameters as 'code' type rather than 'Coding'. This fixes a
ClassCastException when the UI sends parameters as valueCode.

Changes:
- ParamUtil: Catch ClassCastException and return 400 Bad Request
- ImportPnpOperationValidator: Use CodeType for exportType, saveMode,
  inputFormat
- ImportOperationValidator: Use CodeType for resourceType, inputFormat,
  saveMode
- Add inputFormat parameter to import-pnp OperationDefinition
- UI: Send all parameters as valueCode instead of valueCoding
- Add comprehensive unit tests for validators
Aligns test with recent change to ImportOperationValidator that expects
CodeType instead of Coding for the resourceType parameter in import
operation requests.
Adds 25 E2E tests covering form functionality, export execution for all
four export types (system, patient type, patient instance, group),
progress display, file download, cancellation, and error handling.
Adds comprehensive E2E tests covering initialisation, authentication,
stored view definitions, custom JSON input, and results display.
Includes mock data for ViewDefinition resources and view run responses.
Tests login prompts, OAuth flow initiation, callback error handling,
and server capability detection. Focuses on behaviours that can be
reliably tested via API mocking without a real OAuth provider.
…ownership checks

Export operations ($export, view-export, bulk-submit) now return FHIR
Parameters resources directly. A new ParametersToJsonInterceptor
converts responses to JSON format when clients specify Accept:
application/json.

Removed redundant ownership validation from ImportProvider and
ExportOperationHelper. These checks compared the current user to the
job owner, but since AsyncAspect preserves the same authentication
context, the values were always equal. Ownership validation now occurs
at the correct layer: JobProvider.$job for status polling and
ExportResultProvider.$result for file downloads.

Added JobProviderSecurityTest to verify ownership enforcement at the
$job endpoint.
The server now returns Parameters resources (converted to plain JSON)
instead of Binary-wrapped responses for export status operations.
Removed the now-unused Binary detection and base64 decoding logic from
bulkExportStatus, jobStatus, and bulkSubmitStatus functions.
Test assertions used incorrect FHIR parameter types and names:
- Changed valueCoding to valueCode for simple code values
- Changed parameter name from "mode" to "saveMode"
Align with server-side changes that now return FHIR Parameters
resources from export operations. Added getExportOutputFiles()
helper function to extract output file entries from the Parameters
structure. Fixed incorrect nesting in e2e test fixture.
Updates the API endpoint used for job status polling and cancellation
across the UI codebase, including tests and E2E specs.
Create ViewExecutionHelper to eliminate code duplication between
ViewDefinitionRunProvider and ViewDefinitionInstanceRunProvider.
Both providers now delegate view execution to the shared helper.
Consolidates the extractPatientIdsFromGroup method that was duplicated
across ViewExecutionHelper, ViewDefinitionExportProvider, and
GroupExportProvider into a new shared GroupMemberService.
Adds a new JSON output format that returns query results as a single
JSON document containing an array of objects. Each object matches the
format of individual NDJSON lines. Unlike NDJSON, JSON cannot be
streamed row-by-row so results are collected and written at once.
Replace "Select all" and "Clear" buttons with just "Clear", and add
inline note explaining that leaving none selected exports all types.
Extracted common job cancellation patterns into helper methods to
eliminate duplicate log messages and improve maintainability. Also
fixed incorrect log message in onStageCompleted that said "Stage
submitted" instead of "Stage completed".
Users were always redirected to the Dashboard after login instead of the
page they were on when they clicked the login button. This was caused by
React StrictMode's double-mount behaviour clearing the return URL from
sessionStorage before it could be used.

Split getAndClearReturnUrl into separate getReturnUrl and clearReturnUrl
functions so the URL isn't lost during StrictMode's effect re-execution.
Added E2E test that properly mocks fhirclient's sessionStorage state.
Cache invalidation was slow because it scanned all Delta tables to
compute a new cache key. Added an invalidate(tablePath) overload that
queries only the modified table's timestamp, reducing the operation
from O(n) to O(1) for single-resource mutations.

Update and delete operations now use the optimised method, while bulk
import continues to scan all tables since it modifies multiple
resource types.
Add Javadoc to constructor, add final modifiers to parameters, add
@nonnull annotation to Cacheable interface method, and remove unused
getTableUrl method.
Replace null executor arguments with a no-op ThreadPoolTaskExecutor
to satisfy the @nonnull annotation on the CacheableDatabase
constructor. Also add missing final modifiers and chain assertions.
Adds pathling.operations.* configuration properties to enable/disable
individual server operations. All operations are enabled by default.
Disabled operations return a client error and are excluded from the
CapabilityStatement.

Configurable operations include CRUD (create, read, update, delete,
search), batch, export (system/patient/group), import, import-pnp,
viewdefinition-run (system/instance), viewdefinition-export, and
bulk-submit.
The _format parameter remains optional and overrides the Accept header
when provided. Adds fromAcceptHeader() method to ViewOutputFormat with
quality value support. Refactors tests to use parameterized tests and
improved AssertJ assertions.
Adds a comprehensive test that verifies ViewDefinition resources can be
parsed from JSON, encoded to Spark InternalRow, decoded back to
ViewDefinitionResource, and serialised to JSON that matches the original.

Uses 12 real-world ViewDefinition examples from:
- FHIR SQL on FHIR IG (6 views)
- sql-on-fhir-evaluation repository (6 views)

The test currently fails, revealing missing support for:
- fhirVersion field
- valueCode constant values

Adds JSONAssert as a test dependency for JSON comparison.
Adds support for the fhirVersion field (array of codes) to
ViewDefinitionResource to enable round-trip encoding of ViewDefinitions
that specify FHIR version compatibility.

Expands the constant.value[x] choice type to support all types defined
in the SQL on FHIR specification: Base64Binary, Boolean, Canonical,
Code, Date, DateTime, Decimal, Id, Instant, Integer, Oid, PositiveInt,
String, Time, UnsignedInt, Uri, Url, and Uuid.

Adds ConstantTypes.json test fixture to verify round-trip encoding of
the supported constant value types.
The server module was using unshaded Gson while ConstantDeclarationTypeAdapterFactory
(bundled in library-runtime) uses shaded Gson, causing a type mismatch when
registering the adapter. This prevented valueCode, valueString, and other
constant value types from being parsed in ViewDefinitions.

Removed the direct Gson dependency and switched all server code to use the
shaded Gson from library-runtime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fhirpath Related to fhirpath reference implementation new feature New feature or request

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

3 participants