Skip to content

[ZEPPELIN-6241] Fail fast when default web app context fails to initialize due to missing resources#4969

Merged
Reamer merged 3 commits into
apache:masterfrom
tbonelee:fail-fast
Jul 17, 2025
Merged

[ZEPPELIN-6241] Fail fast when default web app context fails to initialize due to missing resources#4969
Reamer merged 3 commits into
apache:masterfrom
tbonelee:fail-fast

Conversation

@tbonelee

@tbonelee tbonelee commented Jul 14, 2025

Copy link
Copy Markdown
Contributor

What is this PR for?

Currently, the server keeps running even when all web app contexts fail to initialize due to missing files or directories for web resources.

In my opinion, if the web resource path for the default web app context does not exist, it would be better to shut down the server immediately, since the context initialization will fail anyway. In such cases, other essential features like REST APIs and WebSocket communication also won’t work properly, so keeping the server running doesn’t seem meaningful.

The absence of non-default web resources, however, seems generally acceptable. So this PR ensures that we only fail fast when the default web app context is missing its required resources.

What type of PR is it?

Improvement

What is the Jira issue?

How should this be tested?

  • Start the server with the default web app directory intentionally missing.
  • Verify that the server fails to start and exit immediately.
  • Ensure that non-default apps can still be missing without preventing startup

Questions:

  • Does the license files need to update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

@tbonelee tbonelee changed the title [ZEPPELIN-XXXX] Fail fast when default web app context fails to initialize due to missing resources [ZEPPELIN-6225] Fail fast when default web app context fails to initialize due to missing resources Jul 14, 2025
@tbonelee tbonelee changed the title [ZEPPELIN-6225] Fail fast when default web app context fails to initialize due to missing resources [ZEPPELIN-6241] Fail fast when default web app context fails to initialize due to missing resources Jul 14, 2025
Comment thread bin/common.sh

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the daemon script, when the corresponding path could not be found automatically, it used to be coerced into an empty string. This unintentionally overrides the default values in ZeppelinConfiguration, and worse, the empty string can be interpreted as the current directory.

To avoid this, I modified the script to set the environment variables only when the resolved path actually exists.

@Reamer

Reamer commented Jul 15, 2025

Copy link
Copy Markdown
Contributor

Changes look good to me. What do you think of the following change?

08:22 $ git diff
diff --git a/zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java b/zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java
index cd1696cd7..8303de4a5 100644
--- a/zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java
+++ b/zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java
@@ -271,7 +271,7 @@ public class ZeppelinServer implements AutoCloseable {
       jettyWebServer.start(); // Instantiates ZeppelinServer
     } catch (Exception e) {
       LOGGER.error("Error while running jettyServer", e);
-      System.exit(-1);
+      shutdown(-1);
     }
 
     LOGGER.info("Done, zeppelin server started");

@tbonelee

Copy link
Copy Markdown
Contributor Author

I agree graceful shutdown is safer, so I replaced System.exit(-1) with shutdown(-1) and pushed the change.

@tbonelee

Copy link
Copy Markdown
Contributor Author

@Reamer
For caution's sake, should we replace this System.exit(-1) with shutdown(-1) as well?

❯❯❯ git diff          
diff --git zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java
index 8303de4a5..4753babc4 100644
--- zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java
+++ zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java
@@ -282,7 +282,7 @@ public class ZeppelinServer implements AutoCloseable {
       }
       if (!errorDatas.isEmpty()) {
         LOGGER.error("{} error(s) while starting - Termination", errorDatas.size());
-        System.exit(-1);
+        shutdown(-1);
       }
     } catch (InterruptedException e) {
       // Many fast unit tests interrupt the Zeppelin server at this point

@Reamer

Reamer commented Jul 15, 2025

Copy link
Copy Markdown
Contributor

Of course, this has to be tested, but I think it makes perfect sense.

@tbonelee

tbonelee commented Jul 15, 2025

Copy link
Copy Markdown
Contributor Author

I couldn’t reproduce a construction-time error to hit this path, but I don’t expect shutdown(-1) to cause issues since it just adds cleanup.
I’ve pushed the change, but let me know if you have any concerns.

@Reamer Reamer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Reamer Reamer merged commit 25cd340 into apache:master Jul 17, 2025
15 of 18 checks passed
asf-gitbox-commits pushed a commit that referenced this pull request Jul 17, 2025
…alize due to missing resources

### What is this PR for?
Currently, the server keeps running even when all web app contexts fail to initialize due to missing files or directories for web resources.

In my opinion, if the web resource path for the default web app context does not exist, it would be better to shut down the server immediately, since the context initialization will fail anyway. In such cases, other essential features like REST APIs and WebSocket communication also won’t work properly, so keeping the server running doesn’t seem meaningful.

The absence of non-default web resources, however, seems generally acceptable. So this PR ensures that we only fail fast when the default web app context is missing its required resources.

### What type of PR is it?
Improvement

### What is the Jira issue?
- https://issues.apache.org/jira/browse/ZEPPELIN-6241

### How should this be tested?
- Start the server with the default web app directory intentionally missing.
- Verify that the server fails to start and exit immediately.
- Ensure that non-default apps can still be missing without preventing startup

### Questions:
* Does the license files need to update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Closes #4969 from tbonelee/fail-fast.

Signed-off-by: Philipp Dallig <philipp.dallig@gmail.com>
(cherry picked from commit 25cd340)
Signed-off-by: Philipp Dallig <philipp.dallig@gmail.com>
@Reamer

Reamer commented Jul 17, 2025

Copy link
Copy Markdown
Contributor

Merged into master and branch-0.12

@tbonelee tbonelee deleted the fail-fast branch July 17, 2025 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants