Skip to content

Conversation

@scottsand-db
Copy link
Collaborator

@scottsand-db scottsand-db commented Oct 14, 2025

🥞 Stacked PR

Use this link to review incremental changes.


Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR adds:

  • Snapshot::getStatistics and SnapshotStatistics APIs
  • ChecksumWriteMode API
  • Snapshot::writeChecksum(Engine, ChecksumWriteMode) API
  • Updates tests

This PR enables the e2e flow of:

  • connector performs a write
  • connector asks for the post commit snapshot
  • connector can look at the statistics and see what is the available checksum write mode
  • for a post-commit snapshot it will be either SIMPLE or FULL (no one else should have written the crc yet)
  • the connector can decide if it wants to pay the cost of the SIMPLE or FULL crc write, and then invoke Snapshot::writeChecksum with the mode they are willing to pay

How was this patch tested?

New UTs. Updated existing write-with-crc-suites to run using this new logic, too.

Does this PR introduce any user-facing changes?

New Snapshot APIs.

@scottsand-db scottsand-db changed the title done first pass w/o tests [Draft] [Kernel] SnapshotStatistics and new Snapshot write-crc-file APIs Oct 14, 2025
@scottsand-db scottsand-db marked this pull request as draft October 14, 2025 23:18
@scottsand-db scottsand-db requested a review from nicklan October 14, 2025 23:18
Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks great, thanks!

Possible other API:

writeChecksum(Engine engine, ChecksumWriteMode mode)

Now the connector if it doesn't really care can just do:

writeChecksum(engine, statistics.getChecksumWriteMode())

@scottsand-db
Copy link
Collaborator Author

@nicklan -- SG -- will implement writeChecksum(Engine engine, ChecksumWriteMode mode)

crcInfo.getFileSizeHistogram))
}

def executeCrcSimple(result: TransactionCommitResult, engine: Engine): TransactionCommitResult = {
Copy link
Collaborator Author

@scottsand-db scottsand-db Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic this method wanted was: write the CRC file via simple mode. Instead, the semantic it was implementing was: write the CRC file if the SimpleCrcHook was present. That was incorrect.

@scottsand-db scottsand-db marked this pull request as ready for review October 16, 2025 16:07
@scottsand-db scottsand-db force-pushed the stack/kernel_snapshot_crc_statistics_and_write branch from 4642bdd to a8afd0b Compare October 16, 2025 16:30
@scottsand-db scottsand-db self-assigned this Oct 16, 2025
@scottsand-db scottsand-db changed the title [Draft] [Kernel] SnapshotStatistics and new Snapshot write-crc-file APIs [Kernel] SnapshotStatistics and new Snapshot write-crc-file APIs Oct 16, 2025
Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, lgtm!

logger.info("Skipping writing checksum file: input mode was NONE");
return;
case SIMPLE:
final CRCInfo crcInfo =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if actual == SIMPLE is this guaranteed to be true? i.e. can we replace this check with that one?

Copy link
Collaborator Author

@scottsand-db scottsand-db Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if actual == SIMPLE then logReplay.getCrcInfoAtSnapshotVersion() is defined.

Would you like something like the below?

if (actual != SIMPLE) throw new IllegalStateException(...)

final CRCInfo crcInfo = logReplay.getCrcInfoAtSnapshotVersion().get()

Using .getOrElse is just a bit more java idiomatic and a bit safer, but I'm okay if you want the above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually -- I think I liked my checks/logic from a few commits ago:

We should just check if the file already exists. If so, that's a no-op and we should logger.warn and exit early.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is best:

  public void writeChecksum(Engine engine, Snapshot.ChecksumWriteMode mode) throws IOException {
    final Snapshot.ChecksumWriteMode actual = getStatistics().getChecksumWriteMode();

    switch (mode) {
      case NONE:
        logger.info("Skipping writing checksum file: input mode was NONE");
      case SIMPLE:
        if (actual == ChecksumWriteMode.NONE) {
          logger.warn("Not writing checksum in SIMPLE mode: checksum file already exists");
          return;
        }
        if (actual == ChecksumWriteMode.FULL) {
          throw new IllegalStateException(
              "Cannot write checksum in SIMPLE mode: FULL mode required");
        }

        final CRCInfo crcInfo = logReplay.getCrcInfoAtSnapshotVersion().get();
        logger.info("Executing checksum write in SIMPLE mode");
        new ChecksumWriter(logPath).writeCheckSum(engine, crcInfo);
      case FULL:
        if (actual == ChecksumWriteMode.NONE) {
          logger.warn("Not writing checksum as FULL: checksum file already exists");
          return;
        }
        if (actual == ChecksumWriteMode.SIMPLE) {
          logger.warn("Checksum SIMPLE mode was available, but FULL mode was requested");
        }

        logger.info("Executing checksum write in FULL mode");
        ChecksumUtils.computeStateAndWriteChecksum(engine, getLogSegment());
      default:
        throw new IllegalStateException("Unknown checksum write mode: " + mode);
    }
  }

@scottsand-db
Copy link
Collaborator Author

scottsand-db commented Oct 17, 2025

@nicklan -- One consequence of void writeChecksum(Engine engine, Snapshot.ChecksumWriteMode mode) taking in the mode is that the mode can be NONE. It's a bit weird to take in a request to not write a checksum. Is that really better than two public APIs?

Another idea: Snapshot::writeChecksum(engine) and it decides internally what to do. you can poke at the statistics before hand to learn if it would be a FULL or SIMPLE write, or NONE if the file already exists?

The updated (local) code for me is below:

  public void writeChecksum(Engine engine, Snapshot.ChecksumWriteMode mode) throws IOException {
    final Snapshot.ChecksumWriteMode actual = getStatistics().getChecksumWriteMode();

    switch (mode) {
      case NONE:
        logger.info("Skipping writing checksum file: input mode was NONE");
        if (actual != ChecksumWriteMode.NONE) {
          logger.warn("Note that the checksum file does NOT actually exist");
        }
        return;
      case SIMPLE:
        if (actual == ChecksumWriteMode.NONE) {
          logger.warn("Not writing checksum in SIMPLE mode: checksum file already exists");
          return;
        }
        if (actual == ChecksumWriteMode.FULL) {
          throw new IllegalStateException(
              "Cannot write checksum in SIMPLE mode: FULL mode required");
        }

        final CRCInfo crcInfo = logReplay.getCrcInfoAtSnapshotVersion().get();
        logger.info("Executing checksum write in SIMPLE mode");
        new ChecksumWriter(logPath).writeCheckSum(engine, crcInfo);
        return;
      case FULL:
        if (actual == ChecksumWriteMode.NONE) {
          logger.warn("Not writing checksum as FULL mode: checksum file already exists");
          return;
        }
        if (actual == ChecksumWriteMode.SIMPLE) {
          logger.warn("Requested checksum write in FULL mode, but SIMPLE mode is available");
        }
        logger.info("Executing checksum write in FULL mode");
        ChecksumUtils.computeStateAndWriteChecksum(engine, getLogSegment());
        return;
      default:
        throw new IllegalStateException("Unknown checksum write mode: " + mode);
    }
  }

Copy link
Collaborator

@allisonport-db allisonport-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just clarifying questions. Tests look great! Feeling a little confused today for some reason 😐

Comment on lines 158 to 160
* <li><b>FULL:</b> Computes the necessary state if needed by replaying the delta log since the
* latest checksum (if present). This always succeeds but may be expensive for large tables
* when CRC information is not available.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - maybe rephrase this? "This always succeeds" sounds like it cannot fail (I assume it can!). I think this is instead trying to convey that regardless of what the Snapshot has in memory, this will try to execute the crc write.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, actually, why does this have a mode input at all? Can this not just look internally at what the mode is?

Maybe, is this just to make it explicit opt-in for the more expensive option?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, actually, why does this have a mode input at all? Can this not just look internally at what the mode is?

Agreed -- if you look at some of the PR comments, I'm thinking along these lines, too.

#5340 (review)

#5340 (comment)

Comment on lines 278 to 279
logReplay
.getCrcInfoAtSnapshotVersion()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, do you know why we store this here? Had to do some digging to understand how/when we populate this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to do with our LogReplay / ProtocolMetadataReplay / DomainMetadata Replay logic.

Fundamentally, the Snapshot should be injected with CRC info (optional) and then use that as needed to defer to static log replay utilities (e.g. for domain metadata).

Instead today, we have some tech debt where Snapshot is injected with a LogReplay, and that LogReplay has the crc info

* SIMPLE and uses the post-commit snapshot's writeChecksumSimple method. Note, this requires the
* test suite uses [[commitTransaction]] and [[verifyWrittenContent]].
*/
trait WriteUtilsWithPostCommitSnapshotCrcSimpleWrite extends AnyFunSuite with WriteUtils {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should factor this out like we did with TransactionBuilderSupport :( Not a blocker though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate, sorry?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked about this in depth before for other test scenarios. But ideally we have abstract definitions for these fxs and then implement them in child traits, instead of just overriding them.

So like TransactionCommitSupport defines these abstractly, and then we can have like BasicTransactionCommitSupport and PostCommitSnapshotCRCCommitSupport etc for each different variation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense -- that would let us use mixins to inject specific funcitonality -- e.g. write crc, write checkpoint, publish, etc.

Comment on lines +251 to +254
// Create version 2 without writing its CRC
val snapshot1 = TableManager.loadSnapshot(tablePath).build(engine)
val txn2 = snapshot1.buildUpdateTableTransaction("xx", Operation.WRITE).build(engine)
txn2.commit(engine, emptyIterable())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, why do we have to do this? won't this test check the same thing if we remove this part?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just tests: writes a few versions (0, 1, 2) and then time travel to version 1 and then we assert that writing the CRC at version 1 is FULL.

So -- if you're asking why do we write v2 -- that just felt like a reasonable thing to test. The alternative is we write versions 0 and 1, and then we time travel to the historical version 0. Could absolutely do that, too. I thought that having a few more versions and time travelling to a middle one was a bit more common.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, why is "time-travel" the important part? If you just load write versions (0, 1) and then load version (1) (not via post-commit snapshot) isn't that the same scenario? or I think maybe that was already tested?

So maybe the answer is = under the hood, this tests the same thing that's already tested, but the time-travel scenario semantically makes sense as a separate test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is we write versions 0 and 1, and then we time travel to the historical version 0.

Not suggesting this! This scenario makes sense if we are trying to test time-travel, just didn't understand why it was really a distinct different for the underlying implementation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to test a historical version (not the latest). All of the other tests are checking that we can write the CRC at the latest version (either loading it explicitly or using the post commit snapshot).

@scottsand-db
Copy link
Collaborator Author

scottsand-db commented Oct 17, 2025

@nicklan I think another option is: Snapshot::getStatistics::getChecksumWriteMode returns Optional<ChecksumWriteMode> which is either SIMPLE or FULL. There is no NONE. That's what Optional.empty is for.

Thus, Snapshot::writeChecksum takes in a SnapshotWriteMode (not optional), which is then of course only SIMPLE or FULL. Much simpler. WDYT?

@nicklan
Copy link
Collaborator

nicklan commented Oct 17, 2025

@nicklan I think another option is: Snapshot::getStatistics::getChecksumWriteMode returns Optional<ChecksumWriteMode> which is either SIMPLE or FULL. There is no NONE. That's what Optional.empty is for.

Thus, Snapshot::writeChecksum takes in a SnapshotWriteMode (not optional), which is then of course only SIMPLE or FULL. Much simpler. WDYT?

Yep, this makes sense to me, allows communicating that there should not be a checksum written, but does not allow "requesting that no checksum be written", which doesn't really make sense.

@allisonport-db
Copy link
Collaborator

I'm still wondering whether writeChecksum really needs to take in a mode, and if so why? (I'm very open to hearing the reasoning here, because I can imagine there's a few possible arguments).

But couldn't you just presume, if you call writeChecksum, the caller is responsible for having checked the mode before-hand? It's already self-defined in the same object. Kind of weird to get a descriptor from your snapshot, and then pass it back as an arg to a method on the same snapshot.

@scottsand-db
Copy link
Collaborator Author

@allisonport-db Yup, fair questions. Both of these APIs are acceptable and reasonable. It comes down to a matter of slight preference, that's all.

writeChecksum(mode) means that the caller is explicitly aware of what mode is being used. they know if it will be cheap or expensive. an alternative we want to avoid is that callers invoke writeChecksum() without being aware that it will require a very expensive full log replay.

logger.warn("Requested checksum write in FULL mode, but SIMPLE mode is available");
}
logger.info("Executing checksum write in FULL mode");
ChecksumUtils.computeStateAndWriteChecksum(engine, getLogSegment());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just do new ChecksumWriter(logPath).writeCheckSum(engine, crcInfo) here when actual == ChecksumWriteMode.SIMPLE?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if computeStateAndWriteChecksum shortcuts that when possible?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should switch on the actual mode instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, of course, but I don't think we should. The user has asked for a FULL replay -- that's what we should do. And we log the warning that they are doing this despite the SIMPLE being available

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also what @nicklan asked for too -- #5340 (comment). I have a slight preference for that way (this current way) but totally understand your perspective, too (that's what I had initially coded up to begin with).

So -- I'm inclined to just go with this and merge this PR sooner than later -- and if you feel strongly to change this feel free to start a convo and we can easily change this later?

// ===== THEN =====
val snapshot0 = result0.getPostCommitSnapshot.get()
assert(snapshot0.getStatistics.getChecksumWriteMode == ChecksumWriteMode.SIMPLE) // expected
assert(snapshot0.getStatistics.getChecksumWriteMode.get == ChecksumWriteMode.SIMPLE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - use .contains?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a Java Optional -- it doesn't have .contains :/

@delta-io delta-io deleted a comment from allisonport-db Oct 17, 2025
@scottsand-db scottsand-db merged commit b02611f into delta-io:master Oct 17, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants