Skip to content

fix(serdes): use fully-expanded schema for content-based registry lookups#8437

Open
carlesarnal wants to merge 1 commit into
mainfrom
fix/3602-avro-nested-record-lookup
Open

fix(serdes): use fully-expanded schema for content-based registry lookups#8437
carlesarnal wants to merge 1 commit into
mainfrom
fix/3602-avro-nested-record-lookup

Conversation

@carlesarnal

Copy link
Copy Markdown
Member

Summary

Fixes #3602. When an Avro schema has a nested record as the sole type of a field (not in a union), the AvroSchemaParser replaces the full nested record definition with just its name in the serialized raw schema. This causes DefaultSchemaResolver.handleResolveSchemaByContent() to send a schema to the registry that doesn't match the fully-expanded schema stored there, resulting in a lookup failure and NPE.

Root Cause

AvroSchemaParser.getSchemaFromData() calls schema.toString(references, false) which replaces nested record definitions in the reference set with just their name (e.g., "type": "record1Avro" instead of the full record definition). This canonicalized form is stored as rawSchema and is what DefaultSchemaResolver sends to the registry for content-based lookups. The registry has the fully-expanded schema, so the content search returns no results.

The auto-create path works correctly because it sends both the referenced schema and the reference links. But the content-search path sends only the referenced schema without any reference context.

Changes

  • ParsedSchema.java: Added getReferencelessRawSchema() default method that falls back to getRawSchema() when not explicitly set
  • ParsedSchemaImpl.java: Added referencelessRawSchema field with getter (returns field if set, otherwise falls back to rawSchema) and builder-style setter
  • AvroSchemaParser.java: In getSchemaFromData(), now stores the fully-expanded schema via s.toString() as referencelessRawSchema alongside the existing reference-canonicalized rawSchema
  • DefaultSchemaResolver.java: handleResolveSchemaByContent() now uses getReferencelessRawSchema() instead of getRawSchema() for content-based lookups
  • AvroSchemaParserDuplicateReferencesTest.java: Added testReferencelessRawSchemaContainsFullNestedDefinition() test that reproduces the exact schema from issue Java Client Lib unable to fetch existing schema from registry when producing a message #3602

Test plan

  • Unit test added: testReferencelessRawSchemaContainsFullNestedDefinition() verifies that the referenceless raw schema contains full nested record definitions while the raw schema has name-only references
  • Existing unit tests pass (./mvnw test -pl serdes/generic/serde-common-avro)
  • Checkstyle passes (./mvnw checkstyle:check -pl schema-resolver,serdes/generic/serde-common-avro)
  • Integration tests with SQL storage variant
  • Integration tests with KafkaSQL storage variant

…kups

Fixes #3602. When an Avro schema has a nested record as the sole type of a
field, AvroSchemaParser replaces the full definition with just the record
name in the raw schema bytes. This causes content-based lookups in
DefaultSchemaResolver to fail because the registry has the fully-expanded
schema.

Add a referencelessRawSchema field to ParsedSchema that stores the
fully-expanded schema (via Schema.toString() without reference
replacement). Use this version in handleResolveSchemaByContent() so
content searches match what's actually stored in the registry.

Signed-off-by: Carles Arnal <carlesarnal@gmail.com>
@github-actions github-actions Bot added lifecycle/ready-for-review Ready for review, full tests running lifecycle/waiting-on-maintainer Blocked on maintainer action labels Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

PR auto-accepted (trusted author). Full test suite will run.

A maintainer can use /skip-review to skip the review requirement for small changes, or /auto-merge to merge automatically once approved and tested.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

The test suite was cancelled for commit edd2d26. See the workflow run. Use /retry to re-run.

@sonarqubecloud

sonarqubecloud Bot commented Jul 2, 2026

Copy link
Copy Markdown

@github-actions github-actions Bot added the lifecycle/tested Full test suite passed for current HEAD label Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Verify — ✅ passed (run)

Phase Status
Lint and Validate 🟢
Build 🟢
Unit Tests 🟢
Integration Tests 🟢
Extra Tests 🟢
SDK Verification 🟢
CLI Verification
Operator Tests
Change detection

java: true, ui: false, integration: true, sdk: false, cli: false, go-sdk-gen: false, operator: false, ci: false

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lifecycle/ready-for-review Ready for review, full tests running lifecycle/tested Full test suite passed for current HEAD lifecycle/waiting-on-maintainer Blocked on maintainer action

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Java Client Lib unable to fetch existing schema from registry when producing a message

1 participant