Use logs dir as working directory #124966

rjernst · 2025-03-16T16:44:56Z

In the unexpected case that Elasticsearch dies due to a segfault or other similar native issue, a core dump is useful in diagnosing the problem. Yet core dumps are written to the working directory, which is read-only for most installations of Elasticsearch. This commit changes the working directory to the logs dir which should always be writeable.

elasticsearchmachine · 2025-03-16T16:45:21Z

Hi @rjernst, I've created a changelog YAML for you.

elasticsearchmachine · 2025-03-16T16:45:21Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

ldematte

Looks good, but I have left some questions, and I see some tests are still failing

...t/groovy/org/elasticsearch/gradle/internal/test/rest/LegacyYamlRestTestPluginFuncTest.groovy

build-tools/src/main/java/org/elasticsearch/gradle/testclusters/ElasticsearchNode.java

distribution/tools/server-cli/src/main/java/org/elasticsearch/server/cli/JvmOption.java

ldematte · 2025-04-08T07:38:17Z

...clusters/src/main/java/org/elasticsearch/test/cluster/local/AbstractLocalClusterFactory.java

-            );
+        private record ReplacementKey(String key, String fallback) {}
+
+        private Map<ReplacementKey, String> getJvmOptionsReplacements() {


This become much more complicated, mirroring the ElasticsearchNode logic. Is this intentional? (And the question about ifs on version still stands: I don't understand why we need them)

The ElasticsearcNode logic is for legacy integ tests, while this factory is for the new style integ tests. They are different systems, and intentionally don't share code (we want to remove the legacy, not be hamstrung by it). So it is expected they have duplication (they already did).

My point is: the previous code was much simpler, and now it contains all the logic for launching 7.x and even 6.x clusters -- do we need it?

I don't think the previous code was exactly simpler. There was a discrepancy between this code (the new test infra) and the old code (ElasticsearchNode). The changes to this function bring them inline with each other. We do need it because at minimum there is a difference in the jvm.options file between 8.19+ and <=8.18.x. The handling of before 6.3 is done in the else cases. I'm happy to remove that, but I would like to do it in a followup so we can be sure no tests are actually relying on it, consistently across old and new test infrastructure.

ldematte

LGTM

...t/groovy/org/elasticsearch/gradle/internal/test/rest/LegacyYamlRestTestPluginFuncTest.groovy

mosche · 2025-04-09T08:16:03Z

build-tools/src/main/java/org/elasticsearch/gradle/testclusters/ElasticsearchNode.java

+
+        ReplacementKey heapDumpPathSub;
+        if (version.before("8.19.0") && version.onOrAfter("6.3.0")) {
+            heapDumpPathSub = new ReplacementKey("-XX:HeapDumpPath=data", "");


I think using "" as fallback is wrong here, every string will contain this fallback leading to unexpected and clearly wrong results when replacing... should this be null or the fallback optional to make the intend clearer in this case?

I like your suggestion. I pushed 43db0f2

mosche · 2025-04-09T08:23:57Z

...clusters/src/main/java/org/elasticsearch/test/cluster/local/AbstractLocalClusterFactory.java

+
+            ReplacementKey heapDumpPathSub;
+            if (version.before("8.19.0") && version.onOrAfter("6.3.0")) {
+                heapDumpPathSub = new ReplacementKey("-XX:HeapDumpPath=data", "");


Same as mentioned previously, the "" fallback looks troublesome. I think that needs special handling (and should be more explicit)

elasticsearchmachine · 2025-04-09T14:08:30Z

💔 Backport failed

Status	Branch	Result
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 124966

donoghuc · 2025-04-09T19:43:31Z

@rjernst With this change I am seeing a difference in behavior. Previously when invoking elasticsearch with path.data or path.logs configured to directories that do not exist elasticsearch would start up. With this change it crashes. This PR elastic/logstash#17531 shows the problem and proposed workaround for logstash CI.

What is the expected behavior for ES when path.logs or path.data is configured to point to a location on disk that does not exist?

Here is a stack trace that seems to point to the recent change in ES:

# Command:
/Users/cas/elastic-repos/logstash/qa/integration/services/../../../build/elasticsearch/bin/elasticsearch \
 -Expack.security.enabled=false \
 -Epath.data=/tmp/ls_integration/es-data \
 -Ediscovery.type=single-node \
 -Epath.logs=/tmp/ls_integration/es-logs \
 -p /Users/cas/elastic-repos/logstash/qa/integration/services/../../../build/elasticsearch/elasticsearch.pid

# Error:
java.io.UncheckedIOException: java.io.IOException: Cannot run program "/Users/cas/elastic-repos/logstash/build/elasticsearch/jdk.app/Contents/Home/bin/java" (in directory "/tmp/ls_integration/es-logs"): error=2, No such file or directory
   at org.elasticsearch.server.cli.ServerProcessBuilder.start(ServerProcessBuilder.java:180)
   at org.elasticsearch.server.cli.ServerProcessBuilder.start(ServerProcessBuilder.java:141)
   at org.elasticsearch.server.cli.ServerCli.startServer(ServerCli.java:276)
   at org.elasticsearch.server.cli.ServerCli.execute(ServerCli.java:112)
   at org.elasticsearch.common.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:55)
   at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:101)
   at org.elasticsearch.cli.Command.main(Command.java:54)
   at org.elasticsearch.launcher.CliToolLauncher.main(CliToolLauncher.java:65)
Caused by: java.io.IOException: Cannot run program "/Users/cas/elastic-repos/logstash/build/elasticsearch/jdk.app/Contents/Home/bin/java" (in directory "/tmp/ls_integration/es-logs"): error=2, No such file or directory
   at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
   at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1044)
   at org.elasticsearch.server.cli.ServerProcessBuilder.createProcess(ServerProcessBuilder.java:204)
   at org.elasticsearch.server.cli.ServerProcessBuilder.start(ServerProcessBuilder.java:165)
   ... 7 more
Caused by: java.io.IOException: error=2, No such file or directory
   at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
   at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:290)
   at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:221)
   at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1076)
   ... 10 more

With the change to using the logs dir as the working dir of the Elasticsearch process we need to ensure the logs dir exists within the CLI instead of later during startup. relates elastic#124966

rjernst · 2025-04-09T21:35:37Z

Thanks for the report @donoghuc. That side effect was unintentional. I've opened #126566 to address the issue.

With the change to using the logs dir as the working dir of the Elasticsearch process we need to ensure the logs dir exists within the CLI instead of later during startup. relates #124966

In the unexpected case that Elasticsearch dies due to a segfault or other similar native issue, a core dump is useful in diagnosing the problem. Yet core dumps are written to the working directory, which is read-only for most installations of Elasticsearch. This commit changes the working directory to the logs dir which should always be writeable.

With the change to using the logs dir as the working dir of the Elasticsearch process we need to ensure the logs dir exists within the CLI instead of later during startup. relates elastic#124966

rjernst added >enhancement :Core/Infra/CLI CLI utilities, scripts, and infrastructure auto-backport Automatically create backport pull requests when merged v8.18.1 v8.19.0 v9.0.1 v9.1.0 labels Mar 16, 2025

rjernst requested a review from a team as a code owner March 16, 2025 16:44

elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Mar 16, 2025

rjernst added 3 commits March 16, 2025 09:45

Update docs/changelog/124966.yaml

6bb542b

Use homedir to find platform dir

ee4956f

iter

3f65603

rjernst requested a review from a team as a code owner March 17, 2025 16:23

rjernst and others added 10 commits March 17, 2025 10:13

oops

0d3cdea

try to fix bwc

5e08b5a

update server cli

6be4935

tests

f4e0622

better substitution

9fae96e

[CI] Auto commit changes from spotless

75f664d

Merge branch 'main' into env/logs_working_dir

a658206

fixes

b54b497

[CI] Auto commit changes from spotless

c4c5f31

iter

1301be0

elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Mar 21, 2025

rjernst and others added 3 commits April 7, 2025 16:42

Merge branch 'main' into env/logs_working_dir

1136cad

fix tests

e01136f

[CI] Auto commit changes from spotless

264158f

sidestep forbidden api

ae3eb71

ldematte reviewed Apr 8, 2025

View reviewed changes

rjernst added 2 commits April 8, 2025 05:57

address feedback

0c9c3c4

don't use mock filesystems, we expect exactly one file

5243f77

rjernst removed v8.18.1 v9.0.1 labels Apr 8, 2025

rjernst enabled auto-merge (squash) April 8, 2025 17:19

rjernst disabled auto-merge April 8, 2025 18:01

ldematte approved these changes Apr 9, 2025

View reviewed changes

mosche reviewed Apr 9, 2025

View reviewed changes

rjernst added 2 commits April 9, 2025 05:44

feedback

43db0f2

Merge branch 'main' into env/logs_working_dir

5fb48da

rjernst merged commit 3bac50e into elastic:main Apr 9, 2025
17 checks passed

elasticsearchmachine added the backport pending label Apr 9, 2025

donoghuc mentioned this pull request Apr 9, 2025

Ensure elasticsearch logs and data dirs exist before startup elastic/logstash#17531

Merged

1 task

rjernst mentioned this pull request Apr 9, 2025

Ensure logs dir exists before using as working dir #126566

Merged

rjernst deleted the env/logs_working_dir branch April 9, 2025 21:35

Use logs dir as working directory #124966

Use logs dir as working directory #124966

Uh oh!

Conversation

rjernst commented Mar 16, 2025

Uh oh!

elasticsearchmachine commented Mar 16, 2025

Uh oh!

elasticsearchmachine commented Mar 16, 2025

Uh oh!

ldematte left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ldematte Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

rjernst Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

ldematte Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

rjernst Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

ldematte left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mosche Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

rjernst Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

mosche Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticsearchmachine commented Apr 9, 2025

💔 Backport failed

Uh oh!

donoghuc commented Apr 9, 2025

Uh oh!

rjernst commented Apr 9, 2025

Uh oh!

Uh oh!