Closed
Description
Hi guys,
Just spent 2 hours trying to understand a problem that didn't exist :)
Short Summary:
- When doing retries when starting a container, don't log in ERROR when the container fails to start but succeeds after a retry as it's misleading.
Long explanation:
Basically our CI logs had this:
09:30:54.172 [main] ERROR ?.141.59] - Could not start container
org.rnorth.ducttape.TimeoutException: Timeout waiting for result with exception
at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:54)
at org.testcontainers.containers.BrowserWebDriverContainer.containerIsStarted(BrowserWebDriverContainer.java:220)
at org.testcontainers.containers.GenericContainer.containerIsStarted(GenericContainer.java:644)
at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:464)
at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:317)
at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:315)
at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:302)
at org.xwiki.test.docker.internal.junit5.DockerTestUtils.startContainer(DockerTestUtils.java:164)
at org.xwiki.test.docker.internal.junit5.browser.BrowserContainerExecutor.start(BrowserContainerExecutor.java:101)
...
Caused by: org.openqa.selenium.remote.UnreachableBrowserException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: '7962d71f639e', ip: '172.17.0.8', os.name: 'Linux', os.arch: 'amd64', os.version: '4.9.165-xxxx-std-ipv6-64', java.version: '1.8.0_232'
Driver info: driver.version: RemoteWebDriver
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:573)
at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213)
at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131)
at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:144)
at org.testcontainers.containers.BrowserWebDriverContainer.lambda$null$0(BrowserWebDriverContainer.java:223)
at org.rnorth.ducttape.unreliables.Unreliables.lambda$retryUntilSuccess$0(Unreliables.java:43)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: unexpected end of stream on Connection{172.17.0.1:42651, proxy=DIRECT hostAddress=/172.17.0.1:42651 cipherSuite=none protocol=http/1.1}
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:208)
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at org.openqa.selenium.remote.internal.OkHttpClient$Factory$1.lambda$createClient$1(OkHttpClient.java:152)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200)
at okhttp3.RealCall.execute(RealCall.java:77)
at org.openqa.selenium.remote.internal.OkHttpClient.execute(OkHttpClient.java:103)
at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:105)
at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74)
at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552)
... 9 common frames omitted
Caused by: java.io.EOFException: \n not found: limit=0 content=…
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:237)
at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215)
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
... 32 common frames omitted
...
09:30:54.209 [main] ERROR ?.141.59] - Log output from the failed container:
...
And we were very surprised to see that the BrowserWebdriverContainer was actually running and the test executed.
After a lot of analysis we found that TC's GenericContainer code logs an ERROR message even if it succeeds later on with a retry. That's very misleading. We believe it should only print a DEBUG message and an ERROR if the last attempt fails.
More specifically it happens in GenericContainter#tryStart()
at:
} catch (Exception e) {
if (e instanceof UndeclaredThrowableException && e.getCause() instanceof Exception) {
e = (Exception) e.getCause();
}
if (e instanceof InvocationTargetException && e.getCause() instanceof Exception) {
e = (Exception) e.getCause();
}
logger().error("Could not start container", e);
if (containerId != null) {
// Log output if startup failed, either due to a container failure or exception (including timeout)
final String containerLogs = getLogs();
if (containerLogs.length() > 0) {
logger().error("Log output from the failed container:\n{}", getLogs());
} else {
logger().error("There are no stdout/stderr logs available for the failed container");
}
}
throw new ContainerLaunchException("Could not create/start container", e);
}
What I would propose to change is:
- Replace the logger().error() by logger().debug() there
- Inside
GenericContainer#doStart()
(the method that callstryStart()
) catch anyRetryCountExceededException
exception and send the ERROR message at that point.
WDYT? That would avoid a lot of misunderstanding and long hours of trying to understand a problem that isn't one ;)
Thanks!