Skip to content

Conversation

@neethuhaneesha
Copy link
Contributor

cherry-pick of #12456

FdbDecode memory issues fix.
The fdbdecode command was throwing memory-related errors such as:
double free or corruption (!prev), free(): invalid pointer, munmap_chunk(): invalid pointer, Segmentation fault
These errors occurred only during the program’s shutdown phase, after all decoding work was completed. They did not affect the correctness of the decoded key–value output.

Root Cause
Valgrind analysis revealed that the crashes were caused by static object destruction order issues, leading to use-after-free and double-free situations.
Issue 1: EventCacheHolder
A static EventCacheHolder instance invoked clear() during its destruction, which accessed a LatestEventCache object that had already been destroyed.
Issue 2: BlobStats
Another static variable, BlobStats, owned an EventCacheHolder instance. During shutdown, its destruction triggered the same invalid access pattern described above.

==40744== Invalid read of size 2
==40744==    at 0x14571E4: operator< (NetworkAddress.h:64)
==40744==    by 0x14571E4: operator() (stl_function.h:400)
==40744==    by 0x14571E4: _M_lower_bound (stl_tree.h:1905)
==40744==    by 0x14571E4: lower_bound (stl_tree.h:1270)
==40744==    by 0x14571E4: lower_bound (stl_map.h:1259)
==40744==    by 0x14571E4: operator[] (stl_map.h:517)
==40744==    by 0x14571E4: LatestEventCache::clear(std::string const&) (Trace.cpp:632)
==40744==    by 0x1243F58: ~EventCacheHolder (Trace.h:524)
==40744==    by 0x1243F58: delref (FastRef.h:70)
==40744==    by 0x1243F58: delref<EventCacheHolder> (FastRef.h:95)
==40744==    by 0x1243F58: ~Reference (FastRef.h:126)
==40744==    by 0x1243F58: CounterCollectionImpl::TraceCountersActorState<CounterCollectionImpl::TraceCountersActor>::~TraceCountersActorState() (Stats.actor.g.cpp:162)
==40744==    by 0x1244270: a_body1Catch1 (Stats.actor.g.cpp:188)
==40744==    by 0x1244270: CounterCollectionImpl::TraceCountersActorState<CounterCollectionImpl::TraceCountersActor>::a_callback_error(ActorCallback<CounterCollectionImpl::TraceCountersActor, 1, Void>*, Error) (Stats.actor.g.cpp:417)
==40744==    by 0x69691E: delFutureRef (flow.h:866)
==40744==    by 0x69691E: delFutureRef (flow.h:863)
==40744==    by 0x69691E: Future<Void>::~Future() (flow.h:948)
==40744==    by 0x498A2DC: __run_exit_handlers (in /usr/lib64/libc.so.6)
==40744==    by 0x498A42F: exit (in /usr/lib64/libc.so.6)
==40744==    by 0x49725D6: (below main) (in /usr/lib64/libc.so.6)

==40744== Invalid read of size 8
==40744==    at 0x1452988: _M_lower_bound (stl_tree.h:1904)
==40744==    by 0x1452988: lower_bound (stl_tree.h:1270)
==40744==    by 0x1452988: lower_bound (stl_map.h:1259)
==40744==    by 0x1452988: clearPrefix_internal(std::map<std::string, TraceEventFields, std::less<std::string>, std::allocator<std::pair<std::string const, TraceEventFields> > >&, std::string const&) (Trace.cpp:627)
==40744==    by 0x1457232: LatestEventCache::clear(std::string const&) (Trace.cpp:632)
==40744==    by 0xE91777: ~EventCacheHolder (Trace.h:524)
==40744==    by 0xE91777: delref (FastRef.h:70)
==40744==    by 0xE91777: delref<EventCacheHolder> (FastRef.h:95)
==40744==    by 0xE91777: ~Reference (FastRef.h:126)
==40744==    by 0xE91777: ~LatencySample (Stats.h:227)
==40744==    by 0xE91777: ~BlobStats (S3BlobStore.h:88)
==40744==    by 0xE91777: operator() (unique_ptr.h:85)
==40744==    by 0xE91777: std::unique_ptr<S3BlobStoreEndpoint::BlobStats, std::default_delete<S3BlobStoreEndpoint::BlobStats> >::~unique_ptr() (unique_ptr.h:361)
==40744==    by 0x498A2DC: __run_exit_handlers (in /usr/lib64/libc.so.6)
==40744==    by 0x498A42F: exit (in /usr/lib64/libc.so.6)
==40744==    by 0x49725D6: (below main) (in /usr/lib64/libc.so.6)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: ad71843
  • Duration 0:12:00
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Copy link
Collaborator

@spraza spraza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd also need one more approval for release branch, maybe @jzhou77 or @saintstack can do it?

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: ad71843
  • Duration 0:46:48
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: ad71843
  • Duration 0:49:03
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: ad71843
  • Duration 0:50:08
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: ad71843
  • Duration 1:05:09
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: ad71843
  • Duration 1:11:03
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@neethuhaneesha
Copy link
Contributor Author

neethuhaneesha commented Oct 22, 2025

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: ad71843
  • Duration 0:12:00
  • Result: ❌ FAILED
  • Error: Error while executing command: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ${HOME}/.ssh_key ec2-user@${MAC_EC2_HOST} /usr/local/bin/bash --login -c ./build_pr_macos.sh. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@jzhou77 macOs build is failing with the system time issue "ninja: error: manifest 'build.ninja' still dirty after 100 tries, perhaps system time is not set". Can you please adjust the time? I forgot on which sever to do.

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: ec856c6
  • Duration 0:49:45
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: ec856c6
  • Duration 0:49:55
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: ec856c6
  • Duration 1:12:32
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: ec856c6
  • Duration 1:15:09
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: ec856c6
  • Duration 1:21:06
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: ec856c6
  • Duration 1:22:58
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants