Skip to content

feat(rest): expose canonicalization endpoints#8336

Open
carlesarnal wants to merge 5 commits into
mainfrom
feat/issue-2473-canonicalization-endpoints
Open

feat(rest): expose canonicalization endpoints#8336
carlesarnal wants to merge 5 commits into
mainfrom
feat/issue-2473-canonicalization-endpoints

Conversation

@carlesarnal

Copy link
Copy Markdown
Member

Summary

Expose schema canonicalization as a first-class REST API feature. Users can now retrieve the canonical form of schemas for comparison purposes — both for arbitrary content and for stored artifact versions.

Fixes #2473

Root Cause

Canonicalization was used only internally for content hash matching (e.g. the canonical query parameter on POST /groups/{groupId}/artifacts for ifExists logic). There was no way to GET the canonicalized form of a schema, which users need for comparing schemas produced by different tools or environments.

Changes

  • common/src/main/resources/META-INF/openapi.json: Added POST /system/canonicalize endpoint with required artifactType query parameter. Added canonical boolean query parameter to GET /groups/{groupId}/artifacts/{artifactId}/versions/{versionExpression}/content.
  • app/.../rest/v3/impl/SystemResourceImpl.java: Implemented canonicalizeContent() — reads the request body, delegates to RegistryStorageContentUtils.canonicalizeContent(), and returns the canonical form. Auth level: Read.
  • app/.../rest/v3/impl/GroupsResourceImpl.java: Extended getArtifactVersionContent() with the new Boolean canonical parameter. When true, resolves references from storage and canonicalizes the content before returning it.
  • app/src/test/.../noprofile/rest/v3/SystemResourceTest.java: Added 6 new tests covering JSON, OpenAPI, and Avro canonicalization; empty body error case; stored content canonical retrieval; and a regression test confirming non-canonical retrieval is unchanged.

Test plan

  • POST /system/canonicalize with JSON content (keys reordered)
  • POST /system/canonicalize with OpenAPI content
  • POST /system/canonicalize with Avro content
  • POST /system/canonicalize with empty body → 400
  • GET .../content?canonical=true for stored JSON artifact
  • GET .../content without canonical param (regression — unchanged behavior)
  • Checkstyle passes with 0 violations
  • SDK updates (java-sdk, go-sdk, python-sdk, typescript-sdk) — tracked as follow-up

Add two new ways to retrieve canonicalized schema content:

1. POST /apis/registry/v3/system/canonicalize?artifactType=<type>
   Accepts arbitrary schema content and returns its canonicalized form.
   Supports all artifact types (JSON, OpenAPI, Avro, Protobuf, etc.).

2. GET .../versions/{versionExpression}/content?canonical=true
   New query parameter on the existing content retrieval endpoint that
   returns stored artifact content in its canonical form.

Closes #2473

Signed-off-by: Carles Arnal <carnalca@redhat.com>
@github-actions github-actions Bot added lifecycle/ready-for-review Ready for review, full tests running lifecycle/waiting-on-maintainer Blocked on maintainer action labels Jun 22, 2026
@github-actions

Copy link
Copy Markdown

PR auto-accepted (trusted author). Full test suite will run.

A maintainer can use /skip-review to skip the review requirement for small changes, or /auto-merge to merge automatically once approved and tested.

@github-actions

Copy link
Copy Markdown

The test suite was cancelled for commit c6cfd78. See the workflow run. Use /retry to re-run.

@github-actions github-actions Bot added lifecycle/waiting-on-author Blocked on contributor action and removed lifecycle/waiting-on-maintainer Blocked on maintainer action labels Jun 22, 2026
@github-actions

Copy link
Copy Markdown

The test suite failed for commit c6cfd78. @carlesarnal, please check the workflow run and push a fix.

@github-actions

This comment has been minimized.

Signed-off-by: Carles Arnal <carnalca@redhat.com>
@github-actions github-actions Bot added lifecycle/tested Full test suite passed for current HEAD lifecycle/waiting-on-maintainer Blocked on maintainer action and removed lifecycle/waiting-on-author Blocked on contributor action labels Jun 23, 2026
Signed-off-by: Carles Arnal <carnalca@redhat.com>
@github-actions github-actions Bot added lifecycle/tested Full test suite passed for current HEAD and removed lifecycle/tested Full test suite passed for current HEAD labels Jun 23, 2026
@carlesarnal carlesarnal requested a review from EricWittmann June 23, 2026 12:13
@carlesarnal

Copy link
Copy Markdown
Member Author

@EricWittmann any other better place than system for the endpoint to given a content, return it canonicalized? admin? anything else?

@github-actions github-actions Bot added the lifecycle/stale No activity for 7+ days label Jun 30, 2026
@github-actions

Copy link
Copy Markdown

This PR has been inactive for 7 days. @carlesarnal, please update or comment
to keep it open. It will be closed in 7 days if no activity occurs.
Use /unstale to remove this label.

@github-actions github-actions Bot removed the lifecycle/stale No activity for 7+ days label Jun 30, 2026
@EricWittmann

Copy link
Copy Markdown
Member

I don't think I would use /system for this. Let me think about other new endpoints we can use.

@EricWittmann

Copy link
Copy Markdown
Member

I think the right home for this is POST /content/canonicalize rather than /system. There's already a /content resource (ContentResource) that handles content-level operations detached from any specific artifact (e.g., /content/references). Canonicalization fits that same pattern — it's a stateless content utility, not a system-level operation.

POST /apis/registry/v3/content/canonicalize?artifactType=AVRO

Map<String, TypedContent> resolvedReferences = RegistryContentUtils
.recursivelyResolveReferences(artifactCell.get().getReferences(),
storage::getContentByReference);
contentToReturn = contentUtils.canonicalizeContent(metaData.getArtifactType(),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Canonicalization runs after handleContentReferences() here — so the canonical output depends on the references query parameter. Do we need a test that exercises both at the same time (e.g. ?references=DEREFERENCE&canonical=true) to make sure the interaction produces the expected result?


@Override
@Authorized(style = AuthorizedStyle.None, level = AuthorizedLevel.Read)
public Response canonicalizeContent(String artifactType, InputStream data) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no validation that artifactType is a known type before passing it to canonicalizeContent(). If someone sends artifactType=GARBAGE, the behavior depends on how the artifact type provider lookup handles unknown types. Should we validate early and return a clear 400?

ContentHandle content = ContentHandle.create(data);
if (content.bytes().length == 0) {
throw new BadRequestException("Empty content is not allowed.");
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using @Context HttpServletRequest to get the content type feels inconsistent with the rest of the codebase — other endpoints get it from the generated JAX-RS interface (e.g. @HeaderParam or @Consumes). This also couples the implementation to the Servlet API. Could the content type come through the generated interface instead?


if (references == null) {
// Check if admin has configured a default reference handling behavior
java.util.Optional<String> configuredDefault = restConfig.getDefaultReferenceHandling();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few existing comments were removed in this method (the "Check if admin has configured..." comment here, the "No configuration..." comment below, and the "Throw 404..." comment further down). Was this intentional cleanup or accidental?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental for sure.

}

@Test
public void testCanonicalizeAvroContent() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenAPI and Avro tests only assert that certain fields exist in the response (e.g. components is not null, openapi equals 3.0.2) — they don't verify that canonicalization actually happened (e.g. keys are sorted, whitespace is normalized). The JSON test does this well with an exact string comparison. Should these tests have similarly strong assertions to confirm the output is actually in canonical form?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they must do it.

- Validate artifactType early and return 400 for unknown types
- Replace @context HttpServletRequest with @context HttpHeaders (JAX-RS)
- Strengthen OpenAPI and Avro test assertions with exact string comparison
- Add test for unknown artifact type returning 400
- Add test for references + canonical query param interaction

Signed-off-by: Carles Arnal <carnalca@redhat.com>
@github-actions github-actions Bot added lifecycle/tested Full test suite passed for current HEAD and removed lifecycle/tested Full test suite passed for current HEAD labels Jul 2, 2026
@github-actions

This comment has been minimized.

Move the canonicalize endpoint to POST /content/canonicalize as suggested
in review — it's a stateless content utility that fits alongside the
existing /content/references endpoint.

- Move implementation from SystemResourceImpl to ContentResourceImpl
- Move tests from SystemResourceTest to ContentResourceTest
- Restore SystemResourceImpl to its original state
- Regenerate Go SDK

Signed-off-by: Carles Arnal <carnalca@redhat.com>
@github-actions github-actions Bot removed the lifecycle/tested Full test suite passed for current HEAD label Jul 2, 2026
@sonarqubecloud

sonarqubecloud Bot commented Jul 2, 2026

Copy link
Copy Markdown

@github-actions github-actions Bot added the lifecycle/tested Full test suite passed for current HEAD label Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Verify — ✅ passed (run)

Phase Status
Lint and Validate 🟢
Build 🟢
Unit Tests 🟢
Integration Tests 🟢
Extra Tests 🟢
SDK Verification 🟢
CLI Verification
Operator Tests
Change detection

java: true, ui: false, integration: true, sdk: true, cli: false, go-sdk-gen: true, operator: false, ci: false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lifecycle/ready-for-review Ready for review, full tests running lifecycle/tested Full test suite passed for current HEAD lifecycle/waiting-on-maintainer Blocked on maintainer action

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose canonicalization endpoints

2 participants