Fix NoClassDefFoundError for MetadataVersionUtil in Cosmos Spark connector#48
Open
xinlian12 wants to merge 14 commits into
Open
Fix NoClassDefFoundError for MetadataVersionUtil in Cosmos Spark connector#48xinlian12 wants to merge 14 commits into
xinlian12 wants to merge 14 commits into
Conversation
…ation - Java-6144129 (Azure#48793) * Configurations: 'specification/azurestackhci/resource-manager/Microsoft.AzureStackHCI/StackHCI/tspconfig.yaml', API Version: 2026-04-01-preview, SDK Release Type: beta, and CommitSHA: 'c22e8792df237fd9afe601d69e305504679c42af' in SpecRepo: 'https://github.com/Azure/azure-rest-api-specs' Pipeline run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=6144129 Refer to https://eng.ms/docs/products/azure-developer-experience/develop/sdk-release/sdk-release-prerequisites to prepare for SDK release. * fix missed version update * Configurations: 'specification/azurestackhci/resource-manager/Microsoft.AzureStackHCI/StackHCI/tspconfig.yaml', API Version: 2026-04-01-preview, SDK Release Type: beta, and CommitSHA: '7f6945ba66f4adffc66a21e9700be37975a4e157' in SpecRepo: 'https://github.com/Azure/azure-rest-api-specs' Pipeline run: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=6150443 Refer to https://eng.ms/docs/products/azure-developer-experience/develop/sdk-release/sdk-release-prerequisites to prepare for SDK release. --------- Co-authored-by: Weidong Xu <weidxu@microsoft.com>
Co-authored-by: Scott Beddall <scbedd@microsoft.com>
…ector Inline version validation logic in ChangeFeedInitialOffsetWriter instead of depending on Spark-internal MetadataVersionUtil, which has been relocated in Databricks Runtime 17.3 LTS (Spark 4.0). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ChangeFeedInitialOffsetWriterSpec with tests covering: - Valid version strings within supported range - Version exceeding max supported (UnsupportedLogVersion) - Malformed versions: non-numeric, empty, missing v prefix, v0, negative, bare v Widen companion object visibility to private[spark] for testability. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…st notebooks Add structured streaming scenarios using cosmos.oltp.changeFeed to both basicScenario.scala and basicScenarioAadManagedIdentity.scala notebooks. These scenarios exercise the ChangeFeedInitialOffsetWriter and HDFSMetadataLog code paths that can break on certain Spark distributions (e.g. Databricks Runtime 17.3+). Each scenario: - Creates a sink container - Reads change feed from source via readStream with micro-batch - Writes to sink container via writeStream - Validates records were copied - Cleans up both containers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use file:/tmp/ instead of /tmp/ for checkpoint location to avoid DBFS access issues on Unity Catalog-enabled Databricks clusters. Also: - Remove unused Trigger import - Stop query before reading sink to avoid race conditions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace cosmos.oltp sink with in-memory sink to eliminate the need for a separate sink container. This avoids 404 errors from sink container creation/resolution and removes checkpoint path concerns. The test still exercises the full ChangeFeedInitialOffsetWriter and HDFSMetadataLog code paths (readStream with cosmos.oltp.changeFeed), which is the goal for validating the MetadataVersionUtil fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Both notebooks now use the same pattern: derive changeFeedCfg from the existing cfg map (which already has the correct auth config) plus the change feed-specific options. Write to an in-memory sink to avoid container creation issues. This ensures both key-based and AAD/MSI notebooks exercise identical streaming logic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The MSI notebook shares a cluster with basicScenario, and the Cosmos client cache retains references from the first notebook's proactive connection init. When basicScenario drops the source container during cleanup, the MSI notebook's change feed streaming fails with 404 on the cached (now-deleted) container. The change feed streaming test in basicScenario already provides sufficient coverage for the ChangeFeedInitialOffsetWriter code paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add detailed logging to capture: - Endpoint, database, container, auth config used - Source container record count before streaming - Streaming query ID - Full exception details on failure This will help diagnose why the change feed streaming fails on the MSI notebook but succeeds on the key-based one. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The MSI change feed test passes on a fresh cluster but fails when basicScenario runs first on the same cluster without restart. The basicScenario leaves cached Cosmos client state (proactive connection init on the ephemeral endpoint) that causes the MSI streaming query to resolve to the wrong endpoint, resulting in a 404. The change feed test in basicScenario provides sufficient coverage for the ChangeFeedInitialOffsetWriter/HDFSMetadataLog code paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a
NoClassDefFoundErrorforMetadataVersionUtilin the Cosmos Spark connector when running on Databricks Runtime 17.3 LTS (Spark 4.0), whereorg.apache.spark.sql.execution.streaming.MetadataVersionUtilhas been relocated/removed.Changes
ChangeFeedInitialOffsetWriterinstead of depending on the Spark-internalMetadataVersionUtilclassvalidateVersionmethod to the companion object that replicates the same behaviorMetadataVersionUtilWhy
MetadataVersionUtilis a Spark-internal utility that is not part of the public API. Databricks Runtime 17.3 LTS (based on Spark 4.0) relocated this class, causing aNoClassDefFoundErrorat runtime when the Cosmos Spark connector tries to deserialize change feed offsets.Since the validation logic is straightforward (parse version number from
vNformat, check bounds), inlining it removes the fragile dependency on Spark internals.Impact
azure-cosmos-spark_3-*) share this source file via Mavenadd-source, so the fix applies to all variants automatically.MetadataVersionUtil.validateVersionsemantics exactly.Verification
azure-cosmos-spark_3-5_2-12MetadataVersionUtilremain in the codebase