Skip to content

Comments

[LIVY-789] Ensure Spark data files are available in interactive session#503

Open
ArnavBalyan wants to merge 2 commits intoapache:masterfrom
ArnavBalyan:arnavb/fix-shell-files
Open

[LIVY-789] Ensure Spark data files are available in interactive session#503
ArnavBalyan wants to merge 2 commits intoapache:masterfrom
ArnavBalyan:arnavb/fix-shell-files

Conversation

@ArnavBalyan
Copy link
Member

@ArnavBalyan ArnavBalyan commented Dec 13, 2025

What changes were proposed in this pull request?

  • Ensures the CWD for interactive sessions is set to Spark root directory for data files to be available. For interactive sessions this bug causes --files argument to not work since the shell does not have access to the staging directory where files are made available.
  • Deprecated unused Spark 1.x code.
  • Adds UTs to ensure files are available when set for interactive sessions.

How was this patch tested?

  • UT

Closes LIVY-789

@ArnavBalyan
Copy link
Member Author

cc @gyogal @lmccay thanks! :)

@ArnavBalyan
Copy link
Member Author

cc @gyogal gentle reminder when you have time thanks!

@gyogal
Copy link
Contributor

gyogal commented Dec 29, 2025

Hi @ArnavBalyan , thanks for submitting a PR for this issue. I think the change itself looks good, but I am not able to run the test successfully. This is what I am getting on my machine:

- should open files localized via spark.files by relative path *** FAILED *** (672 milliseconds)
  ExecuteAborted(Traceback (most recent call last):
    File ".../pr-review/repl/scala-2.12/target/tmp/2817641729331086602", line 664, in <module>
      sys.exit(main())
               ^^^^^^
    File ".../pr-review/repl/scala-2.12/target/tmp/2817641729331086602", line 523, in main
      exec('from pyspark.sql import HiveContext', global_dict)
    File "<string>", line 1, in <module>
  ModuleNotFoundError: No module named 'pyspark') did not equal ExecuteSuccess(JObject(List((text/plain,JString('hello-files'))))) (PythonInterpreterSpec.scala:300)

Edit: It may be possible to move this test to InteractiveIT in livy-integration-test, and then the spark.livy.forceSparkFilesTest part would not be needed since integration tests already run with LIVY_TEST=false.

@ArnavBalyan
Copy link
Member Author

Hi @ArnavBalyan , thanks for submitting a PR for this issue. I think the change itself looks good, but I am not able to run the test successfully. This is what I am getting on my machine:

- should open files localized via spark.files by relative path *** FAILED *** (672 milliseconds)
  ExecuteAborted(Traceback (most recent call last):
    File ".../pr-review/repl/scala-2.12/target/tmp/2817641729331086602", line 664, in <module>
      sys.exit(main())
               ^^^^^^
    File ".../pr-review/repl/scala-2.12/target/tmp/2817641729331086602", line 523, in main
      exec('from pyspark.sql import HiveContext', global_dict)
    File "<string>", line 1, in <module>
  ModuleNotFoundError: No module named 'pyspark') did not equal ExecuteSuccess(JObject(List((text/plain,JString('hello-files'))))) (PythonInterpreterSpec.scala:300)

Edit: It may be possible to move this test to InteractiveIT in livy-integration-test, and then the spark.livy.forceSparkFilesTest part would not be needed since integration tests already run with LIVY_TEST=false.

Hey Gyorgy, thanks for raising this let me check I was able to run the test a while back, livy-integration-test is a great idea, will test it out and get back

@ArnavBalyan ArnavBalyan force-pushed the arnavb/fix-shell-files branch 2 times, most recently from 3d4bf20 to efcf1e9 Compare January 22, 2026 09:17
@ArnavBalyan
Copy link
Member Author

Hi @gyogal I tested the flow again, the PySpark issue may be due to missing local pyspark, I'm able to run locally with pyspark available in the env. The test is using the PythonInterpreter API, which is not available on the livy-integration-test. Making it available there would require a refactor, and some wiring up for pyspark thanks

@ArnavBalyan
Copy link
Member Author

cc @gyogal gentle reminder thanks!

@gyogal
Copy link
Contributor

gyogal commented Feb 17, 2026

@ArnavBalyan Could you please try running the unit test for this PR by rebasing to this commit or by adding a similar change to this PR? If the unit test passes, this could be merged but for some reason the newly added test fails for me when running locally (with no pyspark on the PATH).

@gyogal
Copy link
Contributor

gyogal commented Feb 23, 2026

I have now merged the PR to enable both unit and integration tests for PRs, could you please rebase your changes? If the unit tests are successful, we could merge this PR.

@ArnavBalyan ArnavBalyan force-pushed the arnavb/fix-shell-files branch from efcf1e9 to ba51eeb Compare February 24, 2026 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants