Skip to content

Use logs dir as working directory #124966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Apr 9, 2025
Merged

Conversation

rjernst
Copy link
Member

@rjernst rjernst commented Mar 16, 2025

In the unexpected case that Elasticsearch dies due to a segfault or other similar native issue, a core dump is useful in diagnosing the problem. Yet core dumps are written to the working directory, which is read-only for most installations of Elasticsearch. This commit changes the working directory to the logs dir which should always be writeable.

In the unexpected case that Elasticsearch dies due to a segfault or
other similar native issue, a core dump is useful in diagnosing the
problem. Yet core dumps are written to the working directory, which is
read-only for most installations of Elasticsearch. This commit changes
the working directory to the logs dir which should always be writeable.
@rjernst rjernst added >enhancement :Core/Infra/CLI CLI utilities, scripts, and infrastructure auto-backport Automatically create backport pull requests when merged v8.18.1 v8.19.0 v9.0.1 v9.1.0 labels Mar 16, 2025
@rjernst rjernst requested a review from a team as a code owner March 16, 2025 16:44
@elasticsearchmachine
Copy link
Collaborator

Hi @rjernst, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Mar 16, 2025
@rjernst rjernst requested a review from a team as a code owner March 17, 2025 16:23
@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Mar 21, 2025
Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I have left some questions, and I see some tests are still failing

);
private record ReplacementKey(String key, String fallback) {}

private Map<ReplacementKey, String> getJvmOptionsReplacements() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This become much more complicated, mirroring the ElasticsearchNode logic. Is this intentional? (And the question about ifs on version still stands: I don't understand why we need them)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ElasticsearcNode logic is for legacy integ tests, while this factory is for the new style integ tests. They are different systems, and intentionally don't share code (we want to remove the legacy, not be hamstrung by it). So it is expected they have duplication (they already did).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is: the previous code was much simpler, and now it contains all the logic for launching 7.x and even 6.x clusters -- do we need it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the previous code was exactly simpler. There was a discrepancy between this code (the new test infra) and the old code (ElasticsearchNode). The changes to this function bring them inline with each other. We do need it because at minimum there is a difference in the jvm.options file between 8.19+ and <=8.18.x. The handling of before 6.3 is done in the else cases. I'm happy to remove that, but I would like to do it in a followup so we can be sure no tests are actually relying on it, consistently across old and new test infrastructure.

@rjernst rjernst enabled auto-merge (squash) April 8, 2025 17:19
@rjernst rjernst disabled auto-merge April 8, 2025 18:01
Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


ReplacementKey heapDumpPathSub;
if (version.before("8.19.0") && version.onOrAfter("6.3.0")) {
heapDumpPathSub = new ReplacementKey("-XX:HeapDumpPath=data", "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using "" as fallback is wrong here, every string will contain this fallback leading to unexpected and clearly wrong results when replacing... should this be null or the fallback optional to make the intend clearer in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your suggestion. I pushed 43db0f2


ReplacementKey heapDumpPathSub;
if (version.before("8.19.0") && version.onOrAfter("6.3.0")) {
heapDumpPathSub = new ReplacementKey("-XX:HeapDumpPath=data", "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as mentioned previously, the "" fallback looks troublesome. I think that needs special handling (and should be more explicit)

@rjernst rjernst merged commit 3bac50e into elastic:main Apr 9, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 124966

@donoghuc
Copy link
Member

donoghuc commented Apr 9, 2025

@rjernst With this change I am seeing a difference in behavior. Previously when invoking elasticsearch with path.data or path.logs configured to directories that do not exist elasticsearch would start up. With this change it crashes. This PR elastic/logstash#17531 shows the problem and proposed workaround for logstash CI.

What is the expected behavior for ES when path.logs or path.data is configured to point to a location on disk that does not exist?

Here is a stack trace that seems to point to the recent change in ES:

# Command:
/Users/cas/elastic-repos/logstash/qa/integration/services/../../../build/elasticsearch/bin/elasticsearch \
 -Expack.security.enabled=false \
 -Epath.data=/tmp/ls_integration/es-data \
 -Ediscovery.type=single-node \
 -Epath.logs=/tmp/ls_integration/es-logs \
 -p /Users/cas/elastic-repos/logstash/qa/integration/services/../../../build/elasticsearch/elasticsearch.pid

# Error:
java.io.UncheckedIOException: java.io.IOException: Cannot run program "/Users/cas/elastic-repos/logstash/build/elasticsearch/jdk.app/Contents/Home/bin/java" (in directory "/tmp/ls_integration/es-logs"): error=2, No such file or directory
   at org.elasticsearch.server.cli.ServerProcessBuilder.start(ServerProcessBuilder.java:180)
   at org.elasticsearch.server.cli.ServerProcessBuilder.start(ServerProcessBuilder.java:141)
   at org.elasticsearch.server.cli.ServerCli.startServer(ServerCli.java:276)
   at org.elasticsearch.server.cli.ServerCli.execute(ServerCli.java:112)
   at org.elasticsearch.common.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:55)
   at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:101)
   at org.elasticsearch.cli.Command.main(Command.java:54)
   at org.elasticsearch.launcher.CliToolLauncher.main(CliToolLauncher.java:65)
Caused by: java.io.IOException: Cannot run program "/Users/cas/elastic-repos/logstash/build/elasticsearch/jdk.app/Contents/Home/bin/java" (in directory "/tmp/ls_integration/es-logs"): error=2, No such file or directory
   at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
   at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1044)
   at org.elasticsearch.server.cli.ServerProcessBuilder.createProcess(ServerProcessBuilder.java:204)
   at org.elasticsearch.server.cli.ServerProcessBuilder.start(ServerProcessBuilder.java:165)
   ... 7 more
Caused by: java.io.IOException: error=2, No such file or directory
   at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
   at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:290)
   at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:221)
   at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1076)
   ... 10 more

@rjernst
Copy link
Member Author

rjernst commented Apr 9, 2025

Thanks for the report @donoghuc. That side effect was unintentional. I've opened #126566 to address the issue.

@rjernst rjernst deleted the env/logs_working_dir branch April 9, 2025 21:35
rjernst added a commit that referenced this pull request Apr 17, 2025
With the change to using the logs dir as the working dir of the
Elasticsearch process we need to ensure the logs dir exists within the
CLI instead of later during startup.

relates #124966
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged backport pending :Core/Infra/CLI CLI utilities, scripts, and infrastructure >enhancement serverless-linked Added by automation, don't add manually Team:Core/Infra Meta label for core/infra team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants