Skip to content

Helper server error return code does not fail the test for core = app #440

Open
@nshy

Description

@nshy

It hides memory leak/unexpected panic/crashes/failed assertions.

Repro. Let's make next changes:

diff --git a/src/main.cc b/src/main.cc
index 7dd4f537d..c9de9163f 100644
--- a/src/main.cc
+++ b/src/main.cc
@@ -729,6 +730,8 @@ print_help(FILE *stream)
        fprintf(stream, help_msg, tarantool_version());
 }

+int tarantool_do_panic = 0;
+
 int
 main(int argc, char **argv)
 {
@@ -1101,5 +1104,8 @@ main(int argc, char **argv)
        free((void *)instance.name);
        free((void *)instance.config);
        tarantool_free();
+       if (tarantool_do_panic)
+               panic("AAAAAAA!!!!");
+
        return exit_code;
 }
diff --git a/extra/exports b/extra/exports
index 1dbfedc5c..78db2513b 100644
--- a/extra/exports
+++ b/extra/exportss
@@ -679,3 +679,5 @@ luaM_sysprof_stop
 luaopen_misc

 # }}} LuaJIT API
+
+tarantool_do_panic
diff --git a/test/box/tiny.lua b/test/box/tiny.lua
index 608d48366..2b30f6dfd 100644
--- a/test/box/tiny.lua
+++ b/test/box/tiny.lua
@@ -13,3 +13,7 @@ require('console').listen(os.getenv('ADMIN'))
 box.once('init', function()
     box.schema.user.grant('guest', 'read,write,execute', 'universe')
 end)
+
+local ffi = require('ffi')
+ffi.cdef('int tarantool_do_panic;')
+ffi.C.tarantool_do_panic = 1

The test passes:

test/test-run.py --force --builddir ../build-dev box-tap/session.test
======================================================================================
WORKR TEST                                            RARAMS              RESULT
--------------------------------------------------------------------------------------
[001] box-tap/session.test.lua                                            [ pass ]
--------------------------------------------------------------------------------------

Yet the helper server panics as we planned:

$ grep AAAA /tmp/t/001_box-tap/tiny.log
2024-07-01 11:04:13.672 [50651] main F> AAAAAAA!!!!
2024-07-01 11:04:13.672 [50651] main F> AAAAAAA!!!!

In particular such behaviour may hide memory leak issues under ASAN CI.

I tried to prepare a patch for the issue and found the cause. The thing is gevent does not work properly together with subprocess and multiprocess packages. In particular in test-run on helper server error exit code we actually get 0 error code from subprocess.Popen.returncode in

while self.process.returncode is None:
self.process.poll()
if self.process.returncode is None:
gevent.sleep(0.1)

In more details we run main server:

def run_server(execs, cwd, server, logfile, retval, test_id):
os.putenv("LISTEN", server.listen_uri)
with open(logfile, 'ab') as f:
server.process = Popen(execs, stdout=sys.stdout, stderr=f, cwd=cwd)
sampler.register_process(server.process.pid, test_id, server.name)
test_timeout = Options().args.test_timeout
timer = Timer(test_timeout, timeout_handler, (server.process, test_timeout))
timer.start()
retval['returncode'] = server.process.wait()
timer.cancel()
server.process = None

with gevent.Popen. Unlike to subprocess.popen it tracks SIGCHILD signal and reaps ALL processes when receive it. In particular it reaps helper server process. So when we poll for it exit code we cannot find the process and subprocess decides to just report success.

See also #416

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions