Fix NoClassDefFoundError for MetadataVersionUtil in Cosmos Spark connector by xinlian12 · Pull Request #48 · xinlian12/azure-sdk-for-java

xinlian12 · 2026-04-16T21:27:40Z

Summary

Fixes a NoClassDefFoundError for MetadataVersionUtil in the Cosmos Spark connector when running on Databricks Runtime 17.3 LTS (Spark 4.0), where org.apache.spark.sql.execution.streaming.MetadataVersionUtil has been relocated/removed.

Changes

Inlined version validation logic in ChangeFeedInitialOffsetWriter instead of depending on the Spark-internal MetadataVersionUtil class
Added a validateVersion method to the companion object that replicates the same behavior
Removed the import of MetadataVersionUtil

Why

MetadataVersionUtil is a Spark-internal utility that is not part of the public API. Databricks Runtime 17.3 LTS (based on Spark 4.0) relocated this class, causing a NoClassDefFoundError at runtime when the Cosmos Spark connector tries to deserialize change feed offsets.

Since the validation logic is straightforward (parse version number from vN format, check bounds), inlining it removes the fragile dependency on Spark internals.

Impact

All Spark connector variants (azure-cosmos-spark_3-*) share this source file via Maven add-source, so the fix applies to all variants automatically.
No behavioral change — the inlined logic matches MetadataVersionUtil.validateVersion semantics exactly.

Verification

Compilation verified locally for azure-cosmos-spark_3-5_2-12
No other references to MetadataVersionUtil remain in the codebase

…ation - Java-6144129 (Azure#48793) * Configurations: 'specification/azurestackhci/resource-manager/Microsoft.AzureStackHCI/StackHCI/tspconfig.yaml', API Version: 2026-04-01-preview, SDK Release Type: beta, and CommitSHA: 'c22e8792df237fd9afe601d69e305504679c42af' in SpecRepo: 'https://github.com/Azure/azure-rest-api-specs' Pipeline run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=6144129 Refer to https://eng.ms/docs/products/azure-developer-experience/develop/sdk-release/sdk-release-prerequisites to prepare for SDK release. * fix missed version update * Configurations: 'specification/azurestackhci/resource-manager/Microsoft.AzureStackHCI/StackHCI/tspconfig.yaml', API Version: 2026-04-01-preview, SDK Release Type: beta, and CommitSHA: '7f6945ba66f4adffc66a21e9700be37975a4e157' in SpecRepo: 'https://github.com/Azure/azure-rest-api-specs' Pipeline run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=6150443 Refer to https://eng.ms/docs/products/azure-developer-experience/develop/sdk-release/sdk-release-prerequisites to prepare for SDK release. --------- Co-authored-by: Weidong Xu <weidxu@microsoft.com>

) * Copilot hook script to collect user prompt telemetry

Co-authored-by: Scott Beddall <scbedd@microsoft.com>

…ector Inline version validation logic in ChangeFeedInitialOffsetWriter instead of depending on Spark-internal MetadataVersionUtil, which has been relocated in Databricks Runtime 17.3 LTS (Spark 4.0). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add ChangeFeedInitialOffsetWriterSpec with tests covering: - Valid version strings within supported range - Version exceeding max supported (UnsupportedLogVersion) - Malformed versions: non-numeric, empty, missing v prefix, v0, negative, bare v Widen companion object visibility to private[spark] for testability. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…st notebooks Add structured streaming scenarios using cosmos.oltp.changeFeed to both basicScenario.scala and basicScenarioAadManagedIdentity.scala notebooks. These scenarios exercise the ChangeFeedInitialOffsetWriter and HDFSMetadataLog code paths that can break on certain Spark distributions (e.g. Databricks Runtime 17.3+). Each scenario: - Creates a sink container - Reads change feed from source via readStream with micro-batch - Writes to sink container via writeStream - Validates records were copied - Cleans up both containers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Use file:/tmp/ instead of /tmp/ for checkpoint location to avoid DBFS access issues on Unity Catalog-enabled Databricks clusters. Also: - Remove unused Trigger import - Stop query before reading sink to avoid race conditions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace cosmos.oltp sink with in-memory sink to eliminate the need for a separate sink container. This avoids 404 errors from sink container creation/resolution and removes checkpoint path concerns. The test still exercises the full ChangeFeedInitialOffsetWriter and HDFSMetadataLog code paths (readStream with cosmos.oltp.changeFeed), which is the goal for validating the MetadataVersionUtil fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Both notebooks now use the same pattern: derive changeFeedCfg from the existing cfg map (which already has the correct auth config) plus the change feed-specific options. Write to an in-memory sink to avoid container creation issues. This ensures both key-based and AAD/MSI notebooks exercise identical streaming logic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The MSI notebook shares a cluster with basicScenario, and the Cosmos client cache retains references from the first notebook's proactive connection init. When basicScenario drops the source container during cleanup, the MSI notebook's change feed streaming fails with 404 on the cached (now-deleted) container. The change feed streaming test in basicScenario already provides sufficient coverage for the ChangeFeedInitialOffsetWriter code paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add detailed logging to capture: - Endpoint, database, container, auth config used - Source container record count before streaming - Streaming query ID - Full exception details on failure This will help diagnose why the change feed streaming fails on the MSI notebook but succeeds on the key-based one. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The MSI change feed test passes on a fresh cluster but fails when basicScenario runs first on the same cluster without restart. The basicScenario leaves cached Cosmos client state (proactive connection init on the ephemeral endpoint) that causes the MSI streaming query to resolve to the wrong endpoint, resulting in a 404. The change feed test in basicScenario provides sufficient coverage for the ChangeFeedInitialOffsetWriter/HDFSMetadataLog code paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azure-sdk and others added 14 commits April 15, 2026 19:13

Sync .github directory with azure-sdk-tools repository (Azure#48822)

15af6c9

Sync eng/common directory with azure-sdk-tools for PR 15153 (Azure#48828

d400aae

) * Copilot hook script to collect user prompt telemetry

remove bypass local dns (Azure#48834)

12b3511

Co-authored-by: Scott Beddall <scbedd@microsoft.com>

Remove change feed streaming scenarios from Databricks notebooks

11bfba7

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix NoClassDefFoundError for MetadataVersionUtil in Cosmos Spark connector#48

Fix NoClassDefFoundError for MetadataVersionUtil in Cosmos Spark connector#48
xinlian12 wants to merge 14 commits into
mainfrom
fix/cosmos-spark-metadataversion-noclass

xinlian12 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xinlian12 commented Apr 16, 2026

Summary

Changes

Why

Impact

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants