Skip to content

feat(DeltaUtils): UNIC-2021 Overload stat methods to accept a pre-computed statsDf#287

Merged
laurabegin merged 1 commit intomasterfrom
UNIC-2021
Apr 8, 2026
Merged

feat(DeltaUtils): UNIC-2021 Overload stat methods to accept a pre-computed statsDf#287
laurabegin merged 1 commit intomasterfrom
UNIC-2021

Conversation

@laurabegin
Copy link
Copy Markdown
Member

@laurabegin laurabegin commented Apr 8, 2026

Summary

  • Adds DataFrame overloads for all DeltaUtils stat methods that previously only accepted a path: String
  • The path versions now delegate to the statsDf overloads, so callers that already have a snapshot from getTableStats can pass it directly and avoid re-reading the Delta log
  • Affected methods: getPartitionValues, getNumRecords, getNumRecordsPerPartition, getMinValues, getMinValuesPerPartition, getMaxValues, getMaxValuesPerPartition, getNullCounts, getNullCountsPerPartition

Motivation

QA tests in unic-etl were calling getTableStats multiple times per dataset (once for a hasStats probe, then again inside each stat function). This caused redundant Delta log reads from Minio and a cascade of stuck Spark jobs. The new overloads allow callers to read the snapshot once and reuse it.

Test plan

  • Existing DeltaUtilsSpec tests pass
  • Verify the path overloads still behave identically (they delegate to the new statsDf overloads)

🤖 Generated with Claude Code

…omputed statsDf

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@laurabegin laurabegin marked this pull request as ready for review April 8, 2026 18:49
@AhmedSabsabi
Copy link
Copy Markdown

AhmedSabsabi commented Apr 8, 2026

Looks like a solid refactor, but I want to double-check how these methods are used in unic-etl before approving, just to make sure nothing breaks downstream.

Good for me

@laurabegin
Copy link
Copy Markdown
Member Author

@AhmedSabsabi FYI we use a pinned version in unic-etl so nothing will break by merging this. I will be able to deploy a new datalake-lib version, then update it in unic-etl and test the changes in unic-etl.

@laurabegin laurabegin merged commit f50ca9c into master Apr 8, 2026
1 check passed
@laurabegin laurabegin deleted the UNIC-2021 branch April 8, 2026 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants