Skip to content

Conversation

@zikangh
Copy link
Contributor

@zikangh zikangh commented Oct 17, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR is Part I of implementing SparkMicroBatchStream.initialSnapshot() to support Kernel-based dsv2 Delta streaming (M1 milestone). This PR handles the read option startingVersion.

How was this patch tested?

Parameterized tests verifying parity between DSv1 (DeltaSource) and DSv2 (SparkMicroBatchStream).

Does this PR introduce any user-facing changes?

No

@zikangh
Copy link
Contributor Author

zikangh commented Oct 17, 2025

Hi, could I get a review from @huan233usc @gengliangwang @jerrypeng @tdas please? Thanks!

@zikangh zikangh changed the title [kernel-spark] Add getStartingVersion() to support Kernel-based DSv2 streaming (InitialSnapshot Part I) [kernel-spark] Add getStartingVersion() for obtaining the initial offset for DSv2 streaming Oct 17, 2025
* <p>This is the DSv2 Kernel-based implementation of DeltaSource.getStartingVersion.
*/
Optional<Long> getStartingVersion() {
if (options == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check non null in the constructor? Objects.requireNonNull

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning toward continuing to support null options -- it should represent optional configurations, not required (e.g. some tests pass in zero options). wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between null options and new DeltaOptions(Map$.MODULE$.empty(), spark.sessionState().conf());

Was trying to avoid assigning null if possible to reduce risks of NPE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would "options" be null?

it should represent optional configurations, not required (e.g. some tests pass in zero options)

Java supports an "Optional" API. Why not use that so it is clear to the reader this variable is optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// so we must check the message. See DeltaErrors.unsupportedTableFeature,
// DeltaErrors.unsupportedReaderFeatures, and DeltaErrors.unsupportedWriterFeatures.
String exceptionMessage = e.getMessage();
if (exceptionMessage != null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only keep exceptionMessage.contains("Unsupported Delta reader features")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. what about "Unsupported Delta table feature"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Unsupported Delta table feature"
Let's added that, good catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return true;
} catch (KernelException e) {
// Check if it's an unsupported table feature exception
// Kernel throws plain KernelException (not a subclass) for unsupported features,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a improvement tasks for kernel to have finer-grain error throwing cc @raveeram-db

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great. created an issue: #5369

streamingHelper.checkVersionExists(
version, /* mustBeRecreatable= */ false, /* allowOutOfRange= */ false);
} catch (Exception e) {
throw new RuntimeException("Failed to validate starting version: " + version, e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it helps to avoid this try-catch block if we make the not found exception extend AnalysisException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #5369

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we throw the same exception class as DSv1 which would be AnalysisException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. @jerrypeng yes we can throw the DSv1 version of this Exception
  2. Although we should still wrap this with RuntimeException because (2a). the caller should not have to handle this exception again and (2b). AnalysisException is also checked unfortunately.

deltaLog.checkpoint();

// Delete log files for versions 1-5 to make them non-recreatable
// Note: Version 0 is kept because it contains the table schema
Copy link
Collaborator

@huan233usc huan233usc Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this produces a state of table that will never be reached?

So basically the log file clean up, we will by timestamp so always delete 0-5 (as 0 is earlier), for the schema, we will make sure there is always a checkpoint at the earlist version

-- so to setup the test, let's create 5.checkpoint, 10.checkpoint then remove log 0-5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

// DeltaSQLConf.DELTA_CDF_ALLOW_OUT_OF_RANGE_TIMESTAMP.
if (options.startingVersion().isDefined()) {
DeltaStartingVersion startingVersion = options.startingVersion().get();
if (startingVersion.equals(StartingVersionLatest$.MODULE$)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use instanceOf instead of equals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

String exceptionMessage = e.getMessage();
if (exceptionMessage != null
&& exceptionMessage.contains("Unsupported Delta reader features")) {
throw new RuntimeException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to wrap the KernelException in a RuntimeException? Also DSv1 throws "DeltaUnsupportedTableFeatureException". Can we throw the same exception to maintain the same error handling behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You are right, KernelException is unchecked; I removed the wrapping
  2. it's challenging to convert the KernelException to dsv1's DeltaUnsupportedTableFeatureException in this case as it doesn't expose any error params. I agree we should try to throw the same exceptions as DSv1, but sometimes it's challenging to do so. Is it really that important to keep the Exception types consistent if we are failing the stream anyway?

// Suppress other KernelExceptions
logger.warn("Protocol validation failed at version {} with: {}", version, e.getMessage());
return false;
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to suppress all other exception here for example InterruptedException. We should just follow what DSv1 is doing and only suppress NonFatal Exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what we are doing here is equivalent to the scala NonFatal

  • We won't catch VirtualMachineError, ThreadDeath, LinkageError (extends Error)
  • We won't catch ControlThrowable (extends Throwable)
  • We don't need to catch InterruptedException because it won't be thrown by the kernel.

Arguments.of(/* startingVersion= */ "3", /* expectedVersion= */ Optional.of(3L)),
Arguments.of(/* startingVersion= */ "5", /* expectedVersion= */ Optional.of(5L)),
Arguments.of(/* startingVersion= */ "latest", /* expectedVersion= */ Optional.of(6L)),
Arguments.of(/* startingVersion= */ null, /* expectedVersion= */ Optional.empty()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about these cases:

  1. startingVersion is not set
  2. startingVersion set to Optional.empty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) is already covered by testGetStartingVersion. I added (2): testGetStartingVersion_NoOptions

* Validate the protocol at a given version. If the snapshot reconstruction fails for any other
* reason than unsupported feature exception, we suppress it. This allows fallback to previous
* behavior where the starting version/timestamp was not mandatory to point to reconstructable
* snapshot.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows fallback to previous behavior where the starting version/timestamp was not mandatory to point to reconstructable snapshot.

Where is this code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see

         if (!validateProtocolAt(spark, tablePath, engine, version)) {
          // When starting from a given version, we don't require that the snapshot of this
          // version can be reconstructed, even though the input table is technically in an
          // inconsistent state. If the snapshot cannot be reconstructed, then the protocol
          // check is skipped, so this is technically not safe, but we keep it this way for
          // historical reasons.
          try {
            streamingHelper.checkVersionExists(
                version, /* mustBeRecreatable= */ false, /* allowOutOfRange= */ 
              

/* catalogTableOpt= */ Option.empty(),
options,
/* snapshotAtSourceInit= */ snapshot,
/* metadataPath= */ tablePath + "/_checkpoint",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the location of streaming checkpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the location of PersistedMetadata for advanced schema evolution.

@huan233usc huan233usc requested review from jerrypeng and raveeram-db and removed request for jerrypeng October 24, 2025 17:25
// Attempt to construct a snapshot at the startingVersion to validate the protocol
// If snapshot reconstruction fails, fall back to old behavior where the only
// requirement was for the commit to exist
TableManager.loadSnapshot(tablePath).atVersion(version).build(engine);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we should define a method in streaming helper (maybe called LoadSnapshotAt)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants