-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
AI summary but human verified:
We're running TDengine 3.4.0.2 Community on macOS ARM64 (Apple Silicon) for local development. We have 10 streams processing collectd metrics and a TMQ consumer subscribed to the raw collectd supertables. taosd consistently crashes with a null pointer dereference after 30-60 seconds of TMQ polling.
The crash is in addTagPseudoColumnData — it gets a NULL pointer for a subtable's tag data and passes it straight to memmove via doCopyNItems.
addTagPseudoColumnData looks up the tag value (like host) for a subtable that a stream is concurrently writing to, gets back a NULL pointer because the metadata isn't fully initialized yet, and passes it to memmove which dereferences it.
lldb:
* thread #91, name = 'vnode-query', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: libsystem_platform.dylib`_platform_memmove + 168
frame #1: taosd`doCopyNItems + 260
frame #2: taosd`addTagPseudoColumnData + 1280
frame #3: taosd`doQueueScanNext + 1204
frame #4: taosd`getNextBlockFromDownstreamImpl + 312
frame #5: taosd`getNextBlockFromDownstream + 32
frame #6: taosd`doProjectOperation + 260
frame #7: taosd`qExecTask + 576
frame #8: taosd`getDataBlock + 132
frame #9: taosd`tqScanData + 164
frame #10: taosd`tqExtractDataForMq + 688
frame #11: taosd`tqProcessPollReq + 984
frame #12: taosd`vnodeProcessQueryMsg + 344
frame #13: taosd`vmProcessQueryQueue + 216
frame #14: taosd`tQueryAutoQWorkerThreadFp + 632
At the time of the crash, a snode-stream-runner thread was actively running a multi-table JOIN stream (core_stream) that writes to an output supertable in the same vgroup. That thread's backtrace shows 5 levels of mInnerJoinDo → mJoinMainProcess → doProjectOperation → streamExecuteTask. So we have concurrent stream writes and TMQ reads touching the same vgroup's subtable metadata.
Our setup:
- 6 raw collectd supertables (cpuavg_value, cpumax_value, memory_value, disk_value, if_0, if_1)
- 10 streams: 2 INTERVAL(1s) rate streams, 6 PERIOD(1m) minute aggregation streams, 1 INTERVAL(1m) JOIN stream, 1 PERIOD(1h) hourly stream
- 1 TMQ consumer subscribed to the 6 raw supertables plus 2 stream output tables
- Single collectd host (so only 1 subtable per supertable)
- macOS 15, Apple M3, TDengine 3.4.0.2 Community
The crash is 100% reproducible. taosd starts fine, streams run fine, TMQ connects and polls data successfully for 30-60 seconds, then hits this SEGV. Without TMQ consumers connected, taosd runs indefinitely with no issues.
We haven't been able to test whether this also happens on Linux — our production setup uses the same stream + TMQ configuration on Ubuntu aarch64 and x86_64 without this crash, but the timing may just be different.
Environment:
- TDengine: 3.4.0.2 Community Edition
- OS: macOS 15 (Sequoia), Apple M3 (ARM64)
- Single-node deployment