fix(serdes): use fully-expanded schema for content-based registry lookups#8437
Open
carlesarnal wants to merge 1 commit into
Open
fix(serdes): use fully-expanded schema for content-based registry lookups#8437carlesarnal wants to merge 1 commit into
carlesarnal wants to merge 1 commit into
Conversation
…kups Fixes #3602. When an Avro schema has a nested record as the sole type of a field, AvroSchemaParser replaces the full definition with just the record name in the raw schema bytes. This causes content-based lookups in DefaultSchemaResolver to fail because the registry has the fully-expanded schema. Add a referencelessRawSchema field to ParsedSchema that stores the fully-expanded schema (via Schema.toString() without reference replacement). Use this version in handleResolveSchemaByContent() so content searches match what's actually stored in the registry. Signed-off-by: Carles Arnal <carlesarnal@gmail.com>
|
PR auto-accepted (trusted author). Full test suite will run. A maintainer can use |
|
The test suite was cancelled for commit edd2d26. See the workflow run. Use |
|
|
Verify — ✅ passed (run)
Change detectionjava: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
Fixes #3602. When an Avro schema has a nested record as the sole type of a field (not in a union), the
AvroSchemaParserreplaces the full nested record definition with just its name in the serialized raw schema. This causesDefaultSchemaResolver.handleResolveSchemaByContent()to send a schema to the registry that doesn't match the fully-expanded schema stored there, resulting in a lookup failure and NPE.Root Cause
AvroSchemaParser.getSchemaFromData()callsschema.toString(references, false)which replaces nested record definitions in the reference set with just their name (e.g.,"type": "record1Avro"instead of the full record definition). This canonicalized form is stored asrawSchemaand is whatDefaultSchemaResolversends to the registry for content-based lookups. The registry has the fully-expanded schema, so the content search returns no results.The auto-create path works correctly because it sends both the referenced schema and the reference links. But the content-search path sends only the referenced schema without any reference context.
Changes
ParsedSchema.java: AddedgetReferencelessRawSchema()default method that falls back togetRawSchema()when not explicitly setParsedSchemaImpl.java: AddedreferencelessRawSchemafield with getter (returns field if set, otherwise falls back torawSchema) and builder-style setterAvroSchemaParser.java: IngetSchemaFromData(), now stores the fully-expanded schema vias.toString()asreferencelessRawSchemaalongside the existing reference-canonicalizedrawSchemaDefaultSchemaResolver.java:handleResolveSchemaByContent()now usesgetReferencelessRawSchema()instead ofgetRawSchema()for content-based lookupsAvroSchemaParserDuplicateReferencesTest.java: AddedtestReferencelessRawSchemaContainsFullNestedDefinition()test that reproduces the exact schema from issue Java Client Lib unable to fetch existing schema from registry when producing a message #3602Test plan
testReferencelessRawSchemaContainsFullNestedDefinition()verifies that the referenceless raw schema contains full nested record definitions while the raw schema has name-only references./mvnw test -pl serdes/generic/serde-common-avro)./mvnw checkstyle:check -pl schema-resolver,serdes/generic/serde-common-avro)