Skip to content

[tsan] Fix deadlock with dyld during symbolization on darwin platforms #113661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pudge62
Copy link
Contributor

@pudge62 pudge62 commented Oct 25, 2024

On darwin platforms, callbacks registered via _dyld_register_func_for_add_image are invoked with a lock held by dyld. These callbacks, in turn, request locks in the TSan runtime due to instrumented codes or interceptors. Previously, reporting race issues involved holding TSan runtime locks and simultaneously attemping to acquire the dyld lock during symbolization, which leads to deadlock.

This commit restructures the reporting process into 3 distinct steps: data collection, symbolization and printing. Each step now only holds the necessary locks to prevent the deadlock.

@llvmbot
Copy link
Member

llvmbot commented Oct 25, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: None (pudge62)

Changes

On darwin platforms, callbacks registered via _dyld_register_func_for_add_image are invoked with a lock held by dyld. These callbacks, in turn, request locks in the TSan runtime due to instrumented codes or interceptors. Previously, reporting race issues involved holding TSan runtime locks and simultaneously attemping to acquire the dyld lock during symbolization, which leads to deadlock.

This commit restructures the reporting process into 3 distinct steps: data collection, symbolization and printing. Each step now only holds the necessary locks to prevent the deadlock.


Full diff: https://github.com/llvm/llvm-project/pull/113661.diff

11 Files Affected:

  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp (+1)
  • (modified) compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp (+1)
  • (modified) compiler-rt/lib/tsan/rtl/tsan_interface_ann.cpp (+1)
  • (modified) compiler-rt/lib/tsan/rtl/tsan_mman.cpp (+1)
  • (modified) compiler-rt/lib/tsan/rtl/tsan_report.h (+2)
  • (modified) compiler-rt/lib/tsan/rtl/tsan_rtl.h (+4-3)
  • (modified) compiler-rt/lib/tsan/rtl/tsan_rtl_mutex.cpp (+3)
  • (modified) compiler-rt/lib/tsan/rtl/tsan_rtl_report.cpp (+117-40)
  • (modified) compiler-rt/lib/tsan/rtl/tsan_rtl_thread.cpp (+1)
  • (modified) compiler-rt/lib/tsan/rtl/tsan_stack_trace.cpp (+12)
  • (modified) compiler-rt/lib/tsan/rtl/tsan_stack_trace.h (+5)
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp b/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp
index 74458028ae8f55..b064663c998554 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp
@@ -108,6 +108,7 @@ bool Symbolizer::SymbolizeData(uptr addr, DataInfo *info) {
                                          &arch))
     return false;
   info->Clear();
+  info->start = addr;
   info->module = internal_strdup(module_name);
   info->module_offset = module_offset;
   info->module_arch = arch;
diff --git a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
index 9cab2a37271288..346cfcc98da5ee 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
@@ -2069,6 +2069,7 @@ static void ReportErrnoSpoiling(ThreadState *thr, uptr pc, int sig) {
   rep.SetSigNum(sig);
   if (!IsFiredSuppression(ctx, ReportTypeErrnoInSignal, stack)) {
     rep.AddStack(stack, true);
+    rep.SymbolizeReport();
     OutputReport(thr, rep);
   }
 }
diff --git a/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cpp b/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cpp
index befd6a369026d8..711f0795df6ecb 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cpp
@@ -446,6 +446,7 @@ static void ReportMutexHeldWrongContext(ThreadState *thr, uptr pc) {
   VarSizeStackTrace trace;
   ObtainCurrentStack(thr, pc, &trace);
   rep.AddStack(trace, true);
+  rep.SymbolizeReport();
   OutputReport(thr, rep);
 }
 
diff --git a/compiler-rt/lib/tsan/rtl/tsan_mman.cpp b/compiler-rt/lib/tsan/rtl/tsan_mman.cpp
index 0705365d77427d..30153c3d13914d 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_mman.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_mman.cpp
@@ -185,6 +185,7 @@ static void SignalUnsafeCall(ThreadState *thr, uptr pc) {
   ThreadRegistryLock l(&ctx->thread_registry);
   ScopedReport rep(ReportTypeSignalUnsafe);
   rep.AddStack(stack, true);
+  rep.SymbolizeReport();
   OutputReport(thr, rep);
 }
 
diff --git a/compiler-rt/lib/tsan/rtl/tsan_report.h b/compiler-rt/lib/tsan/rtl/tsan_report.h
index bfe470797f8f77..d1568c1434d0a4 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_report.h
+++ b/compiler-rt/lib/tsan/rtl/tsan_report.h
@@ -16,6 +16,7 @@
 #include "sanitizer_common/sanitizer_thread_registry.h"
 #include "sanitizer_common/sanitizer_vector.h"
 #include "tsan_defs.h"
+#include "tsan_stack_trace.h"
 
 namespace __tsan {
 
@@ -39,6 +40,7 @@ enum ReportType {
 };
 
 struct ReportStack {
+  VarSizeStackTrace raw;
   SymbolizedStack *frames = nullptr;
   bool suppressable = false;
 };
diff --git a/compiler-rt/lib/tsan/rtl/tsan_rtl.h b/compiler-rt/lib/tsan/rtl/tsan_rtl.h
index f48be8e0a4fe08..0dd6dd2743d225 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_rtl.h
+++ b/compiler-rt/lib/tsan/rtl/tsan_rtl.h
@@ -406,6 +406,7 @@ uptr TagFromShadowStackFrame(uptr pc);
 
 class ScopedReportBase {
  public:
+  void SetKindAndTag(ReportType typ, uptr tag);
   void AddMemoryAccess(uptr addr, uptr external_tag, Shadow s, Tid tid,
                        StackTrace stack, const MutexSet *mset);
   void AddStack(StackTrace stack, bool suppressable = false);
@@ -418,17 +419,16 @@ class ScopedReportBase {
   void SetCount(int count);
   void SetSigNum(int sig);
 
+  void SymbolizeReport();
   const ReportDesc *GetReport() const;
 
  protected:
+  ScopedReportBase();
   ScopedReportBase(ReportType typ, uptr tag);
   ~ScopedReportBase();
 
  private:
   ReportDesc *rep_;
-  // Symbolizer makes lots of intercepted calls. If we try to process them,
-  // at best it will cause deadlocks on internal mutexes.
-  ScopedIgnoreInterceptors ignore_interceptors_;
 
   ScopedReportBase(const ScopedReportBase &) = delete;
   void operator=(const ScopedReportBase &) = delete;
@@ -436,6 +436,7 @@ class ScopedReportBase {
 
 class ScopedReport : public ScopedReportBase {
  public:
+  explicit ScopedReport();
   explicit ScopedReport(ReportType typ, uptr tag = kExternalTagNone);
   ~ScopedReport();
 
diff --git a/compiler-rt/lib/tsan/rtl/tsan_rtl_mutex.cpp b/compiler-rt/lib/tsan/rtl/tsan_rtl_mutex.cpp
index 2a8aa1915c9aeb..ba63a5f245338b 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_rtl_mutex.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_rtl_mutex.cpp
@@ -62,6 +62,7 @@ static void ReportMutexMisuse(ThreadState *thr, uptr pc, ReportType typ,
   ObtainCurrentStack(thr, pc, &trace);
   rep.AddStack(trace, true);
   rep.AddLocation(addr, 1);
+  rep.SymbolizeReport();
   OutputReport(thr, rep);
 }
 
@@ -548,6 +549,7 @@ void ReportDeadlock(ThreadState *thr, uptr pc, DDReport *r) {
       }
     }
   }
+  rep.SymbolizeReport();
   OutputReport(thr, rep);
 }
 
@@ -572,6 +574,7 @@ void ReportDestroyLocked(ThreadState *thr, uptr pc, uptr addr,
     return;
   rep.AddStack(trace, true);
   rep.AddLocation(addr, 1);
+  rep.SymbolizeReport();
   OutputReport(thr, rep);
 }
 
diff --git a/compiler-rt/lib/tsan/rtl/tsan_rtl_report.cpp b/compiler-rt/lib/tsan/rtl/tsan_rtl_report.cpp
index 0311df553fdd0a..288d3d6da3dde3 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_rtl_report.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_rtl_report.cpp
@@ -29,7 +29,7 @@ namespace __tsan {
 
 using namespace __sanitizer;
 
-static ReportStack *SymbolizeStack(StackTrace trace);
+static ReportStack *SymbolizeStack(const StackTrace &trace);
 
 // Can be overriden by an application/test to intercept reports.
 #ifdef TSAN_EXTERNAL_HOOKS
@@ -98,9 +98,10 @@ ReportStack *SymbolizeStackId(u32 stack_id) {
   return SymbolizeStack(stack);
 }
 
-static ReportStack *SymbolizeStack(StackTrace trace) {
+static SymbolizedStack *SymbolizeRawStack(const StackTrace &trace) {
   if (trace.size == 0)
     return 0;
+  CHECK_NE(trace.trace, nullptr);
   SymbolizedStack *top = nullptr;
   for (uptr si = 0; si < trace.size; si++) {
     const uptr pc = trace.trace[si];
@@ -121,12 +122,31 @@ static ReportStack *SymbolizeStack(StackTrace trace) {
     top = ent;
   }
   StackStripMain(top);
+  return top;
+}
+
+static ReportStack *SymbolizeStack(const StackTrace &trace) {
+  if (trace.size == 0)
+    return 0;
+  CHECK_NE(trace.trace, nullptr);
+  auto *stack = New<ReportStack>();
+  stack->frames = SymbolizeRawStack(trace);
+  return stack;
+}
 
+static ReportStack *CreateReportStack(const StackTrace &trace) {
+  if (trace.size == 0)
+    return 0;
+  CHECK_NE(trace.trace, nullptr);
   auto *stack = New<ReportStack>();
-  stack->frames = top;
+  stack->raw.Init(trace);
   return stack;
 }
 
+static ReportStack *CreateReportStack(const StackTrace &&trace) {
+  return CreateReportStack(trace);
+}
+
 bool ShouldReport(ThreadState *thr, ReportType typ) {
   // We set thr->suppress_reports in the fork context.
   // Taking any locking in the fork context can lead to deadlocks.
@@ -155,22 +175,27 @@ bool ShouldReport(ThreadState *thr, ReportType typ) {
   }
 }
 
-ScopedReportBase::ScopedReportBase(ReportType typ, uptr tag) {
-  ctx->thread_registry.CheckLocked();
+ScopedReportBase::ScopedReportBase() {
   rep_ = New<ReportDesc>();
+}
+
+ScopedReportBase::ScopedReportBase(ReportType typ, uptr tag) : ScopedReportBase() {
   rep_->typ = typ;
   rep_->tag = tag;
-  ctx->report_mtx.Lock();
 }
 
 ScopedReportBase::~ScopedReportBase() {
-  ctx->report_mtx.Unlock();
   DestroyAndFree(rep_);
 }
 
+void ScopedReportBase::SetKindAndTag(ReportType typ, uptr tag) {
+  rep_->typ = typ;
+  rep_->tag = tag;
+}
+
 void ScopedReportBase::AddStack(StackTrace stack, bool suppressable) {
   ReportStack **rs = rep_->stacks.PushBack();
-  *rs = SymbolizeStack(stack);
+  *rs = CreateReportStack(stack);
   (*rs)->suppressable = suppressable;
 }
 
@@ -187,8 +212,8 @@ void ScopedReportBase::AddMemoryAccess(uptr addr, uptr external_tag, Shadow s,
   mop->size = size;
   mop->write = !(typ & kAccessRead);
   mop->atomic = typ & kAccessAtomic;
-  mop->stack = SymbolizeStack(stack);
   mop->external_tag = external_tag;
+  mop->stack = CreateReportStack(stack);
   if (mop->stack)
     mop->stack->suppressable = true;
   for (uptr i = 0; i < mset->Size(); i++) {
@@ -216,8 +241,7 @@ void ScopedReportBase::AddThread(const ThreadContext *tctx, bool suppressable) {
   rt->name = internal_strdup(tctx->name);
   rt->parent_tid = tctx->parent_tid;
   rt->thread_type = tctx->thread_type;
-  rt->stack = 0;
-  rt->stack = SymbolizeStackId(tctx->creation_stack_id);
+  rt->stack = CreateReportStack(StackDepotGet(tctx->creation_stack_id));
   if (rt->stack)
     rt->stack->suppressable = suppressable;
 }
@@ -270,7 +294,7 @@ int ScopedReportBase::AddMutex(uptr addr, StackID creation_stack_id) {
   rep_->mutexes.PushBack(rm);
   rm->id = rep_->mutexes.Size() - 1;
   rm->addr = addr;
-  rm->stack = SymbolizeStackId(creation_stack_id);
+  rm->stack = CreateReportStack(StackDepotGet(creation_stack_id));
   return rm->id;
 }
 
@@ -288,7 +312,7 @@ void ScopedReportBase::AddLocation(uptr addr, uptr size) {
     loc->fd_closed = closed;
     loc->fd = fd;
     loc->tid = creat_tid;
-    loc->stack = SymbolizeStackId(creat_stack);
+    loc->stack = CreateReportStack(StackDepotGet(creat_stack));
     rep_->locs.PushBack(loc);
     AddThread(creat_tid);
     return;
@@ -310,7 +334,7 @@ void ScopedReportBase::AddLocation(uptr addr, uptr size) {
     loc->heap_chunk_size = b->siz;
     loc->external_tag = b->tag;
     loc->tid = b->tid;
-    loc->stack = SymbolizeStackId(b->stk);
+    loc->stack = CreateReportStack(StackDepotGet(b->stk));
     rep_->locs.PushBack(loc);
     AddThread(b->tid);
     return;
@@ -324,16 +348,16 @@ void ScopedReportBase::AddLocation(uptr addr, uptr size) {
     AddThread(tctx);
   }
 #endif
-  if (ReportLocation *loc = SymbolizeData(addr)) {
-    loc->suppressable = true;
-    rep_->locs.PushBack(loc);
-    return;
-  }
+  auto *loc = New<ReportLocation>();
+  loc->type = ReportLocationGlobal;
+  loc->global.start = addr;
+  loc->suppressable = true;
+  rep_->locs.PushBack(loc);
 }
 
 #if !SANITIZER_GO
 void ScopedReportBase::AddSleep(StackID stack_id) {
-  rep_->sleep = SymbolizeStackId(stack_id);
+  rep_->sleep = CreateReportStack(StackDepotGet(stack_id));
 }
 #endif
 
@@ -341,8 +365,57 @@ void ScopedReportBase::SetCount(int count) { rep_->count = count; }
 
 void ScopedReportBase::SetSigNum(int sig) { rep_->signum = sig; }
 
+void ScopedReportBase::SymbolizeReport() {
+  // Symbolizer makes lots of intercepted calls. If we try to process them,
+  // at best it will cause deadlocks on internal mutexes.
+  ScopedIgnoreInterceptors ignore_interceptors_;
+  for (uptr i = 0; i < rep_->stacks.Size(); i++) {
+    rep_->stacks[i]->frames = SymbolizeRawStack(rep_->stacks[i]->raw);
+    rep_->stacks[i]->raw.Reset();
+  }
+  for (uptr i = 0; i < rep_->mops.Size(); i++) {
+    if (rep_->mops[i]->stack) {
+      rep_->mops[i]->stack->frames =
+          SymbolizeRawStack(rep_->mops[i]->stack->raw);
+      rep_->mops[i]->stack->raw.Reset();
+    }
+  }
+  for (uptr i = 0; i < rep_->locs.Size(); i++) {
+    if (rep_->locs[i]->stack) {
+      rep_->locs[i]->stack->frames =
+          SymbolizeRawStack(rep_->locs[i]->stack->raw);
+      rep_->locs[i]->stack->raw.Reset();
+    }
+    if (rep_->locs[i]->type == ReportLocationGlobal) {
+      Symbolizer::GetOrInit()->SymbolizeData(rep_->locs[i]->global.start,
+                                             &rep_->locs[i]->global);
+    }
+  }
+  for (uptr i = 0; i < rep_->mutexes.Size(); i++) {
+    if (rep_->mutexes[i]->stack) {
+      rep_->mutexes[i]->stack->frames =
+          SymbolizeRawStack(rep_->mutexes[i]->stack->raw);
+      rep_->mutexes[i]->stack->raw.Reset();
+    }
+  }
+  for (uptr i = 0; i < rep_->threads.Size(); i++) {
+    if (rep_->threads[i]->stack) {
+      rep_->threads[i]->stack->frames =
+          SymbolizeRawStack(rep_->threads[i]->stack->raw);
+      rep_->threads[i]->stack->raw.Reset();
+    }
+  }
+  if (rep_->sleep) {
+    rep_->sleep->frames = SymbolizeRawStack(rep_->sleep->raw);
+    rep_->sleep->raw.Reset();
+  }
+}
+
 const ReportDesc *ScopedReportBase::GetReport() const { return rep_; }
 
+ScopedReport::ScopedReport()
+    : ScopedReportBase() {}
+
 ScopedReport::ScopedReport(ReportType typ, uptr tag)
     : ScopedReportBase(typ, tag) {}
 
@@ -633,6 +706,7 @@ bool OutputReport(ThreadState *thr, const ScopedReport &srep) {
   // It's too late to check them here, we have already taken locks.
   CHECK(flags()->report_bugs);
   CHECK(!thr->suppress_reports);
+  Lock l(&ctx->report_mtx);
   atomic_store_relaxed(&ctx->last_symbolize_time_ns, NanoTime());
   const ReportDesc *rep = srep.GetReport();
   CHECK_EQ(thr->current_report, nullptr);
@@ -705,18 +779,12 @@ static bool SpuriousRace(Shadow old) {
   return last.sid() == old.sid() && last.epoch() == old.epoch();
 }
 
-void ReportRace(ThreadState *thr, RawShadow *shadow_mem, Shadow cur, Shadow old,
-                AccessType typ0) {
-  CheckedMutex::CheckNoLocks();
-
-  // Symbolizer makes lots of intercepted calls. If we try to process them,
-  // at best it will cause deadlocks on internal mutexes.
-  ScopedIgnoreInterceptors ignore;
-
+static bool BuildRaceReport(ScopedReport &rep, ThreadState *thr, RawShadow *shadow_mem,
+                     Shadow &cur, Shadow &old, AccessType typ0) {
   uptr addr = ShadowToMem(shadow_mem);
   DPrintf("#%d: ReportRace %p\n", thr->tid, (void *)addr);
   if (!ShouldReport(thr, ReportTypeRace))
-    return;
+    return false;
   uptr addr_off0, size0;
   cur.GetAccess(&addr_off0, &size0, nullptr);
   uptr addr_off1, size1, typ1;
@@ -724,9 +792,9 @@ void ReportRace(ThreadState *thr, RawShadow *shadow_mem, Shadow cur, Shadow old,
   if (!flags()->report_atomic_races &&
       ((typ0 & kAccessAtomic) || (typ1 & kAccessAtomic)) &&
       !(typ0 & kAccessFree) && !(typ1 & kAccessFree))
-    return;
+    return false;
   if (SpuriousRace(old))
-    return;
+    return false;
 
   const uptr kMop = 2;
   Shadow s[kMop] = {cur, old};
@@ -737,7 +805,7 @@ void ReportRace(ThreadState *thr, RawShadow *shadow_mem, Shadow cur, Shadow old,
   uptr addr_min = min(addr0, addr1);
   uptr addr_max = max(end0, end1);
   if (IsExpectedReport(addr_min, addr_max - addr_min))
-    return;
+    return false;
 
   ReportType rep_typ = ReportTypeRace;
   if ((typ0 & kAccessVptr) && (typ1 & kAccessFree))
@@ -748,7 +816,7 @@ void ReportRace(ThreadState *thr, RawShadow *shadow_mem, Shadow cur, Shadow old,
     rep_typ = ReportTypeUseAfterFree;
 
   if (IsFiredSuppression(ctx, rep_typ, addr))
-    return;
+    return false;
 
   VarSizeStackTrace traces[kMop];
   Tid tids[kMop] = {thr->tid, kInvalidTid};
@@ -756,7 +824,7 @@ void ReportRace(ThreadState *thr, RawShadow *shadow_mem, Shadow cur, Shadow old,
 
   ObtainCurrentStack(thr, thr->trace_prev_pc, &traces[0], &tags[0]);
   if (IsFiredSuppression(ctx, rep_typ, traces[0]))
-    return;
+    return false;
 
   DynamicMutexSet mset1;
   MutexSet *mset[kMop] = {&thr->mset, mset1};
@@ -767,18 +835,18 @@ void ReportRace(ThreadState *thr, RawShadow *shadow_mem, Shadow cur, Shadow old,
   ThreadRegistryLock l0(&ctx->thread_registry);
   Lock slots_lock(&ctx->slot_mtx);
   if (SpuriousRace(old))
-    return;
+    return false;
   if (!RestoreStack(EventType::kAccessExt, s[1].sid(), s[1].epoch(), addr1,
                     size1, typ1, &tids[1], &traces[1], mset[1], &tags[1])) {
     StoreShadow(&ctx->last_spurious_race, old.raw());
-    return;
+    return false;
   }
 
   if (IsFiredSuppression(ctx, rep_typ, traces[1]))
-    return;
+    return false;
 
   if (HandleRacyStacks(thr, traces))
-    return;
+    return false;
 
   // If any of the accesses has a tag, treat this as an "external" race.
   uptr tag = kExternalTagNone;
@@ -789,8 +857,7 @@ void ReportRace(ThreadState *thr, RawShadow *shadow_mem, Shadow cur, Shadow old,
       break;
     }
   }
-
-  ScopedReport rep(rep_typ, tag);
+  rep.SetKindAndTag(rep_typ, tag);
   for (uptr i = 0; i < kMop; i++)
     rep.AddMemoryAccess(addr, tags[i], s[i], tids[i], traces[i], mset[i]);
 
@@ -819,6 +886,16 @@ void ReportRace(ThreadState *thr, RawShadow *shadow_mem, Shadow cur, Shadow old,
       s[1].epoch() <= thr->last_sleep_clock.Get(s[1].sid()))
     rep.AddSleep(thr->last_sleep_stack_id);
 #endif
+  return true;
+}
+
+void ReportRace(ThreadState *thr, RawShadow *shadow_mem, Shadow cur, Shadow old,
+                AccessType typ0) {
+  ScopedReport rep;
+  if (!BuildRaceReport(rep, thr, shadow_mem, cur, old, typ0)) {
+    return;
+  }
+  rep.SymbolizeReport();
   OutputReport(thr, rep);
 }
 
diff --git a/compiler-rt/lib/tsan/rtl/tsan_rtl_thread.cpp b/compiler-rt/lib/tsan/rtl/tsan_rtl_thread.cpp
index 5316a7862e449c..ec5e919e1e1cf3 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_rtl_thread.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_rtl_thread.cpp
@@ -96,6 +96,7 @@ void ThreadFinalize(ThreadState *thr) {
     ScopedReport rep(ReportTypeThreadLeak);
     rep.AddThread(leaks[i].tctx, true);
     rep.SetCount(leaks[i].count);
+    rep.SymbolizeReport();
     OutputReport(thr, rep);
   }
 #endif
diff --git a/compiler-rt/lib/tsan/rtl/tsan_stack_trace.cpp b/compiler-rt/lib/tsan/rtl/tsan_stack_trace.cpp
index 9bbaafb3a85f59..8dd42006e1fe96 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_stack_trace.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_stack_trace.cpp
@@ -38,11 +38,23 @@ void VarSizeStackTrace::Init(const uptr *pcs, uptr cnt, uptr extra_top_pc) {
     trace_buffer[cnt] = extra_top_pc;
 }
 
+void VarSizeStackTrace::Init(const StackTrace &trace) {
+  ResizeBuffer(trace.size);
+  internal_memcpy(trace_buffer, trace.trace, trace.size * sizeof(trace.trace[0]));
+}
+
 void VarSizeStackTrace::ReverseOrder() {
   for (u32 i = 0; i < (size >> 1); i++)
     Swap(trace_buffer[i], trace_buffer[size - 1 - i]);
 }
 
+void VarSizeStackTrace::Reset() {
+  size = 0;
+  trace = nullptr;
+  Free(trace_buffer);
+  trace_buffer = nullptr;
+}
+
 }  // namespace __tsan
 
 #if !SANITIZER_GO
diff --git a/compiler-rt/lib/tsan/rtl/tsan_stack_trace.h b/compiler-rt/lib/tsan/rtl/tsan_stack_trace.h
index 3eb8ce156e8358..deda28b5ac8560 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_stack_trace.h
+++ b/compiler-rt/lib/tsan/rtl/tsan_stack_trace.h
@@ -24,12 +24,17 @@ struct VarSizeStackTrace : public StackTrace {
 
   VarSizeStackTrace();
   ~VarSizeStackTrace();
+
   void Init(const uptr *pcs, uptr cnt, uptr extra_top_pc = 0);
+  void Init(const StackTrace &trace);
 
   // Reverses the current stack trace order, the top frame goes to the bottom,
   // the last frame goes to the top.
   void ReverseOrder();
 
+  // Clear and release internal buffer.
+  void Reset();
+
  private:
   void ResizeBuffer(uptr new_size);
 

Copy link

github-actions bot commented Oct 25, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@pudge62 pudge62 force-pushed the tsan/fix_dyld_deadlock branch from 6b918e4 to 99d38b1 Compare October 25, 2024 09:42
@pudge62
Copy link
Contributor Author

pudge62 commented Oct 25, 2024

cc @kcc @dvyukov @vitalybuka for review

@pudge62 pudge62 force-pushed the tsan/fix_dyld_deadlock branch 3 times, most recently from a6fa821 to 8a3851b Compare October 28, 2024 04:30
@pudge62
Copy link
Contributor Author

pudge62 commented Nov 13, 2024

Ping

1 similar comment
@pudge62
Copy link
Contributor Author

pudge62 commented Dec 2, 2024

Ping

@vitalybuka
Copy link
Collaborator

TSAN part contains a lot of changes.
Would be possible to split them into separate PR with exactly explanation.

@vitalybuka vitalybuka requested review from melver and dvyukov December 2, 2024 23:44
@pudge62 pudge62 force-pushed the tsan/fix_dyld_deadlock branch from 8a3851b to 2fa70e1 Compare December 3, 2024 03:17
@pudge62
Copy link
Contributor Author

pudge62 commented Dec 3, 2024

TSAN part contains a lot of changes. Would be possible to split them into separate PR with exactly explanation.

Sorry, this PR is specifically focused on fixing the deadlock issue, I believe it would be challenging to split it into separate PRs

@pudge62 pudge62 requested a review from vitalybuka December 6, 2024 06:06
@pudge62 pudge62 changed the title [tsan] Fix deadlock with dyld during symbolization on darwin platforms [tsan] [WIP] Fix deadlock with dyld during symbolization on darwin platforms Dec 6, 2024
@pudge62 pudge62 changed the title [tsan] [WIP] Fix deadlock with dyld during symbolization on darwin platforms [tsan] Fix deadlock with dyld during symbolization on darwin platforms Dec 6, 2024
@pudge62 pudge62 marked this pull request as draft December 6, 2024 11:06
@pudge62 pudge62 force-pushed the tsan/fix_dyld_deadlock branch 4 times, most recently from ab7b54f to c8dd365 Compare December 17, 2024 10:57
On darwin platforms, callbacks registered via `_dyld_register_func_for_add_image`
are invoked with a lock held by dyld. These callbacks, in turn, request locks in
the TSan runtime due to instrumented codes or interceptors. Previously, reporting
race issues involved holding TSan runtime locks and simultaneously attemping to
acquire the dyld lock during symbolization, which leads to deadlock.

This commit restructures the reporting process into 3 distinct steps: data
collection, symbolization and printing. Each step now only holds the necessary
locks to prevent the deadlock.
@pudge62 pudge62 force-pushed the tsan/fix_dyld_deadlock branch from c8dd365 to f0bd42b Compare December 17, 2024 11:30
@pudge62 pudge62 marked this pull request as ready for review December 18, 2024 08:39
@pudge62
Copy link
Contributor Author

pudge62 commented Dec 18, 2024

@vitalybuka @melver @dvyukov Hi guys, I found my changes introduced another deadlock, because the lock ordering was changed. I have fixed it In the latest update. Here’s the new pattern for reporting an issue:

  ScopedReport rep(typ);
  { // 1. Build the report
    ThreadRegistryLock l(&ctx->thread_registry);
    rep.AddStack(trace, true);
    rep.AddLocation(addr, 1);
  }
  rep.SymbolizeReport(); // 2. Symbolize the report
  OutputReport(thr, rep); // 3. Output the report 

I removed the ScopedErrorReportLock in ScopedReport class, which caused the new deadlock before. While symbolizing the report, we don't hold any locks in the runtime, which prevents the deadlock with dyld. The logging step is made atomic by holding the ctx->report_mtx lock in the OutputReport function.

@dvyukov
Copy link
Collaborator

dvyukov commented Dec 18, 2024

I removed the ScopedErrorReportLock in ScopedReport class

Why was it needed/why is not it needed anymore?
Things were these on a purpose generally.

@dvyukov
Copy link
Collaborator

dvyukov commented Dec 18, 2024

I found my changes introduced another deadlock

Why wasn't it detected before by the internal deadlock detector? It should have been warned on any unit test that exercises that code.

@pudge62
Copy link
Contributor Author

pudge62 commented Dec 18, 2024

I removed the ScopedErrorReportLock in ScopedReport class

Why was it needed/why is not it needed anymore?
Things were these on a purpose generally.

I found my changes introduced another deadlock

Why wasn't it detected before by the internal deadlock detector? It should have been warned on any unit test that exercises that code.

I believe the ScopedErrorReportLock was originally used to make the entire reporting process atomic. However, I have now divided the process into separate steps, rendering it unnecessary. Additionally, the new deadlock introduced by my previous changes was also related to ScopedErrorReportLock. This occurred because it was locked first only in the ReportRace function, whereas in other situations, the ThreadRegistryLock might be locked first.

Regarding the internal deadlock detector, are you referring to sanitizer_deadlock_detector1.cpp? It can't detect deadlocks within the runtime, right?

@vitalybuka vitalybuka removed their request for review February 20, 2025 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants