Skip to content

Conversation

@huan233usc
Copy link
Collaborator

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

- Changed from baseDirectory.getParentFile/spark to (delta-spark-v1/baseDirectory).value
- This is more explicit and clearly shows we're using delta-spark-v1's directory
- Makes the relationship between modules more obvious
Issue: Test resource directories were using baseDirectory.getParentFile/spark
which could evaluate to the wrong path depending on evaluation order.

Solution: Changed all test path configurations to consistently use
(delta-spark-v1/baseDirectory).value:
- Test/unmanagedSourceDirectories
- Test/unmanagedResourceDirectories
- Test/resourceDirectory
- Test/baseDirectory
- Test/javaOptions (-Duser.dir)

This ensures all test paths correctly point to the spark/ directory
regardless of evaluation order, fixing GitHub Actions failures.
The spark module was adding all test javaOptions again (which are already in commonSettings),
causing duplicates. Now it only adds -Duser.dir which is spark-specific.
…ectory

Root cause: TestParallelization.defaultForkOptions was using baseDirectory.value
for workingDirectory, but spark module's Test/baseDirectory points to spark/
while baseDirectory points to spark-combined/.

When GitHub Actions runs 'spark/test' with TEST_PARALLELISM_COUNT=4 SHARD_ID=x,
the forked test JVMs got spark-combined/ as working directory, causing tests
that use relative paths (like 'src/test/resources/delta/table-with-dv-large')
to fail.

Solution: Changed defaultForkOptions to use (Test/baseDirectory).value instead
of baseDirectory.value, so it correctly uses spark/ as the working directory.

This only affects the spark module which is the only user of TestParallelization.
Issue: serverClassPath contains multiple 'classes' directories with the same name
(e.g., spark/target/scala-2.12/classes, storage/target/scala-2.12/classes, etc.).
When creating symlinks, the code tried to create multiple symlinks all named 'classes',
causing FileAlreadyExistsException.

Solution: Track created symlink names in a Set and skip duplicates. Only the first
occurrence of each filename will have a symlink created.

Also added Files.exists() check and similar fix for log4j properties symlink.
The issue is simply that serverClassPath contains multiple directories with
the same name (e.g., 7 different 'classes' directories). Using a Set to track
created symlink names is sufficient - no need for try-catch or concurrent
access handling since each shard runs in its own workspace.
Changed kernelDefaults to depend on local delta-spark-v1 instead of
published delta-spark 3.3.2. This makes the dependency consistent with
goldenTables (which already uses delta-spark-v1) and allows testing
against the current codebase.

Changes:
- Added .dependsOn(`delta-spark-v1` % "test") to kernelDefaults
- Removed external 'io.delta' %% 'delta-spark' % '3.3.2' % 'test' dependency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant