Skip to content

Don't log as error when inside a GenericContainer start and it succeeds on a retry #2483

Closed
@vmassol

Description

@vmassol

Hi guys,

Just spent 2 hours trying to understand a problem that didn't exist :)

Short Summary:

  • When doing retries when starting a container, don't log in ERROR when the container fails to start but succeeds after a retry as it's misleading.

Long explanation:

Basically our CI logs had this:

09:30:54.172 [main] ERROR ?.141.59] - Could not start container
org.rnorth.ducttape.TimeoutException: Timeout waiting for result with exception
        at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:54)
        at org.testcontainers.containers.BrowserWebDriverContainer.containerIsStarted(BrowserWebDriverContainer.java:220)
        at org.testcontainers.containers.GenericContainer.containerIsStarted(GenericContainer.java:644)
        at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:464)
        at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:317)
        at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
        at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:315)
        at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:302)
        at org.xwiki.test.docker.internal.junit5.DockerTestUtils.startContainer(DockerTestUtils.java:164)
        at org.xwiki.test.docker.internal.junit5.browser.BrowserContainerExecutor.start(BrowserContainerExecutor.java:101)
...
Caused by: org.openqa.selenium.remote.UnreachableBrowserException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: '7962d71f639e', ip: '172.17.0.8', os.name: 'Linux', os.arch: 'amd64', os.version: '4.9.165-xxxx-std-ipv6-64', java.version: '1.8.0_232'
Driver info: driver.version: RemoteWebDriver
        at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:573)
        at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213)
        at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131)
        at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:144)
        at org.testcontainers.containers.BrowserWebDriverContainer.lambda$null$0(BrowserWebDriverContainer.java:223)
        at org.rnorth.ducttape.unreliables.Unreliables.lambda$retryUntilSuccess$0(Unreliables.java:43)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: unexpected end of stream on Connection{172.17.0.1:42651, proxy=DIRECT hostAddress=/172.17.0.1:42651 cipherSuite=none protocol=http/1.1}
        at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:208)
        at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at org.openqa.selenium.remote.internal.OkHttpClient$Factory$1.lambda$createClient$1(OkHttpClient.java:152)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200)
        at okhttp3.RealCall.execute(RealCall.java:77)
        at org.openqa.selenium.remote.internal.OkHttpClient.execute(OkHttpClient.java:103)
        at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:105)
        at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74)
        at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136)
        at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552)
        ... 9 common frames omitted
Caused by: java.io.EOFException: \n not found: limit=0 content=…
        at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:237)
        at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215)
        at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
        ... 32 common frames omitted
...
09:30:54.209 [main] ERROR ?.141.59] - Log output from the failed container:
...

And we were very surprised to see that the BrowserWebdriverContainer was actually running and the test executed.

After a lot of analysis we found that TC's GenericContainer code logs an ERROR message even if it succeeds later on with a retry. That's very misleading. We believe it should only print a DEBUG message and an ERROR if the last attempt fails.

More specifically it happens in GenericContainter#tryStart() at:

        } catch (Exception e) {
            if (e instanceof UndeclaredThrowableException && e.getCause() instanceof Exception) {
                e = (Exception) e.getCause();
            }
            if (e instanceof InvocationTargetException && e.getCause() instanceof Exception) {
                e = (Exception) e.getCause();
            }
            logger().error("Could not start container", e);

            if (containerId != null) {
                // Log output if startup failed, either due to a container failure or exception (including timeout)
                final String containerLogs = getLogs();

                if (containerLogs.length() > 0) {
                    logger().error("Log output from the failed container:\n{}", getLogs());
                } else {
                    logger().error("There are no stdout/stderr logs available for the failed container");
                }
            }

            throw new ContainerLaunchException("Could not create/start container", e);
        }

What I would propose to change is:

  • Replace the logger().error() by logger().debug() there
  • Inside GenericContainer#doStart() (the method that calls tryStart()) catch any RetryCountExceededException exception and send the ERROR message at that point.

WDYT? That would avoid a lot of misunderstanding and long hours of trying to understand a problem that isn't one ;)

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions