[ZEPPELIN-6241] Fail fast when default web app context fails to initialize due to missing resources#4969
Conversation
There was a problem hiding this comment.
In the daemon script, when the corresponding path could not be found automatically, it used to be coerced into an empty string. This unintentionally overrides the default values in ZeppelinConfiguration, and worse, the empty string can be interpreted as the current directory.
To avoid this, I modified the script to set the environment variables only when the resolved path actually exists.
|
Changes look good to me. What do you think of the following change? 08:22 $ git diff
diff --git a/zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java b/zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java
index cd1696cd7..8303de4a5 100644
--- a/zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java
+++ b/zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java
@@ -271,7 +271,7 @@ public class ZeppelinServer implements AutoCloseable {
jettyWebServer.start(); // Instantiates ZeppelinServer
} catch (Exception e) {
LOGGER.error("Error while running jettyServer", e);
- System.exit(-1);
+ shutdown(-1);
}
LOGGER.info("Done, zeppelin server started"); |
… on jetty server startup failure
|
I agree graceful shutdown is safer, so I replaced |
|
@Reamer ❯❯❯ git diff
diff --git zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java
index 8303de4a5..4753babc4 100644
--- zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java
+++ zeppelin-server/src/main/java/org/apache/zeppelin/server/ZeppelinServer.java
@@ -282,7 +282,7 @@ public class ZeppelinServer implements AutoCloseable {
}
if (!errorDatas.isEmpty()) {
LOGGER.error("{} error(s) while starting - Termination", errorDatas.size());
- System.exit(-1);
+ shutdown(-1);
}
} catch (InterruptedException e) {
// Many fast unit tests interrupt the Zeppelin server at this point |
|
Of course, this has to be tested, but I think it makes perfect sense. |
|
I couldn’t reproduce a construction-time error to hit this path, but I don’t expect shutdown(-1) to cause issues since it just adds cleanup. |
…alize due to missing resources ### What is this PR for? Currently, the server keeps running even when all web app contexts fail to initialize due to missing files or directories for web resources. In my opinion, if the web resource path for the default web app context does not exist, it would be better to shut down the server immediately, since the context initialization will fail anyway. In such cases, other essential features like REST APIs and WebSocket communication also won’t work properly, so keeping the server running doesn’t seem meaningful. The absence of non-default web resources, however, seems generally acceptable. So this PR ensures that we only fail fast when the default web app context is missing its required resources. ### What type of PR is it? Improvement ### What is the Jira issue? - https://issues.apache.org/jira/browse/ZEPPELIN-6241 ### How should this be tested? - Start the server with the default web app directory intentionally missing. - Verify that the server fails to start and exit immediately. - Ensure that non-default apps can still be missing without preventing startup ### Questions: * Does the license files need to update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Closes #4969 from tbonelee/fail-fast. Signed-off-by: Philipp Dallig <philipp.dallig@gmail.com> (cherry picked from commit 25cd340) Signed-off-by: Philipp Dallig <philipp.dallig@gmail.com>
|
Merged into master and branch-0.12 |
What is this PR for?
Currently, the server keeps running even when all web app contexts fail to initialize due to missing files or directories for web resources.
In my opinion, if the web resource path for the default web app context does not exist, it would be better to shut down the server immediately, since the context initialization will fail anyway. In such cases, other essential features like REST APIs and WebSocket communication also won’t work properly, so keeping the server running doesn’t seem meaningful.
The absence of non-default web resources, however, seems generally acceptable. So this PR ensures that we only fail fast when the default web app context is missing its required resources.
What type of PR is it?
Improvement
What is the Jira issue?
How should this be tested?
Questions: