Skip to content

[SPARK-54022][SPARK-56617][SQL][TESTS] Add more CACHE TABLE tests and reorganize CACHE TABLE tests#55536

Open
longvu-db wants to merge 9 commits intoapache:masterfrom
longvu-db:spark-dsv2-cache-scenario-5
Open

[SPARK-54022][SPARK-56617][SQL][TESTS] Add more CACHE TABLE tests and reorganize CACHE TABLE tests#55536
longvu-db wants to merge 9 commits intoapache:masterfrom
longvu-db:spark-dsv2-cache-scenario-5

Conversation

@longvu-db
Copy link
Copy Markdown
Contributor

@longvu-db longvu-db commented Apr 24, 2026

What changes were proposed in this pull request?

Reorganize and expand the CACHE TABLE test coverage in DataSourceV2DataFrameSuite.

Moved tests: Three existing cache-pinning tests are relocated to the bottom of the suite, grouped under a section comment for discoverability:

  • "cached DSv2 table DataFrame is refreshed and reused after insert" (Scenario 2)
  • "caching table via Dataset API should pin table state" (Scenario 1+2)
  • "caching a query via Dataset API should not pin table state"

New tests:

  • Scenario 3 ("cached table pinned against external schema change"): External ADD COLUMN via catalog API is invisible to the cached table.
  • Scenario 4 ("session schema change invalidates cache"): Session ALTER TABLE ADD COLUMN invalidates and rebuilds cache with the new 3-column schema.
  • Scenario 5 ("cached table after external drop and recreate sees empty table"): External drop+recreate via catalog API produces a new table with a different ID; query sees the new empty table.

Why are the changes needed?

The existing tests covered scenarios 1 and 2 (external data write pinning and session write invalidation), but did not cover:

  • External schema changes with cache pinning (scenario 3)
  • Session schema changes invalidating cache (scenario 4)
  • External drop and recreate of a cached table (scenario 5)

These scenarios are important to verify the correctness of DSv2 cache behavior as described in the design doc.

Does this PR introduce any user-facing change?

No. This PR only adds and reorganizes tests.

How was this patch tested?

New and moved tests in DataSourceV2DataFrameSuite.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-6)

…reorganize cache tests

Move existing cache-pinning tests to the bottom of DataSourceV2DataFrameSuite
and add three new tests covering design doc Section [5] scenarios:

- Scenario 3: external schema change is invisible to cached table
- Scenario 4: session schema change invalidates and rebuilds cache
- Scenario 5: external drop/recreate produces new table ID, query sees empty table

Co-authored-by: Isaac
@longvu-db longvu-db force-pushed the spark-dsv2-cache-scenario-5 branch from 174cefd to 75db50d Compare April 24, 2026 12:31
- Combine Scenarios 1+2 into single test following design doc flow
- Remove separate "variant" test
- Scenario 4: add external write after session schema change
- Scenario 5: remove ID references from comments

Co-authored-by: Isaac
- Restore all 3 original cache tests (moved to bottom)
- New Scenario 1+2 test uses InMemoryBaseTable.withData() to simulate
  true external INSERT bypassing session CacheManager
- Scenario 4 also uses withData() for external write simulation

Co-authored-by: Isaac
@longvu-db longvu-db changed the title [SPARK-54022][SQL][TESTS] Add CACHE TABLE scenario 3, 4, 5 tests and reorganize cache tests [SPARK-54022][SQL][TESTS] Add more CACHE TABLE tests and reorganize CACHE TABLE tests Apr 24, 2026
@longvu-db longvu-db changed the title [SPARK-54022][SQL][TESTS] Add more CACHE TABLE tests and reorganize CACHE TABLE tests [SPARK-56617][SQL][TESTS] Add more CACHE TABLE tests and reorganize CACHE TABLE tests Apr 24, 2026
@longvu-db longvu-db changed the title [SPARK-56617][SQL][TESTS] Add more CACHE TABLE tests and reorganize CACHE TABLE tests [SPARK-54022][SPARK-56617][SQL][TESTS] Add more CACHE TABLE tests and reorganize CACHE TABLE tests Apr 24, 2026
- Use loadTable with INSERT privilege to get original table (not copy)
  for withData calls, fixing copyOnLoad interaction
- Split Scenarios 1+2 into separate tests to avoid data accumulation
- All 8 cache tests pass locally

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant