Skip to content

CRASH: Fatal error: failed to create trace file ... window.0000 ...., with window.0001 directory #7704

@palmer-dabbelt

Description

@palmer-dabbelt

Describe the bug
From time to time I have a DynamoRIO crash with an error message along the lines of

Fatal error: failed to create trace file /root/palmerdabbelt/runs/django-mini-drmemtrace_interval-1.OflDffjb/drmemtrace.uwsgi.1937455.5528.dir/raw/window.0000/drmemtrace.uwsgi.1937455.0618.raw.lz4

The resulting trace directory has a window.0001 file instead, so that error message itself seems somewhat sane. I think it's a race -- init_offline_dir() has an unprotected load of the trace window ID:

    if (has_tracing_windows())
        open_new_window_dir(tracing_window.load(std::memory_order_acquire));

and if the trace window has moved

The issue goes away if I just stick a second mkdir in there

diff --git a/clients/drcachesim/tracer/output.cpp b/clients/drcachesim/tracer/output.cpp
index 63a8b9353..c4e40ff00 100644
--- a/clients/drcachesim/tracer/output.cpp
+++ b/clients/drcachesim/tracer/output.cpp
@@ -398,6 +398,11 @@ open_new_thread_file(void *drcontext, ptr_int_t window_num)
             dr_snprintf(windir, BUFFER_SIZE_ELEMENTS(windir), "%s%s" WINDOW_SUBDIR_FORMAT,
                         logsubdir, DIRSEP, window_num);
             NULL_TERMINATE_BUFFER(windir);
+
+           if (!file_ops_func.create_dir(windir))
+                FATAL("Fiailed to create window subdir %s\n", windir);
+           NOTIFY(2, "Created new window dir %s\n", windir);
+
             dir = windir;
         } else if (data->file != INVALID_FILE)
             return false;

but I think that warrants a little refactoring to remove the duplicate calls (and code from open_new_window_dir()). I'll post a patch, unless someone has a better idea of what's going on.

To Reproduce
(Happy to write more, but I think I'm going to fix it myself.)

  1. I'm using DCPerf's Django-mini workload.
  2. Running with -trace_for_insns 1000 makes this happen almost always.

It's fine without any client, and I haven't tried a debug build.

Expected behavior
I get traces ;)

Screenshots or Pasted Text

Fatal error: failed to create trace file /root/palmerdabbelt/runs/django-mini-drmemtrace_interval-1.OflDffjb/drmemtrace.uwsgi.1937455.5528.dir/raw/window.0000/drmemtrace.uwsgi.1937455.0618.raw.lz4

Versions
I'm running github's main branch from this morning, on an aarch64 machine (CentOS Stream 9-ish).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions