Skip to content

taosd SEGV in addTagPseudoColumnData when TMQ polls while streams write to same vgroup (macOS ARM64) #34544

@brunohaid

Description

@brunohaid

AI summary but human verified:

We're running TDengine 3.4.0.2 Community on macOS ARM64 (Apple Silicon) for local development. We have 10 streams processing collectd metrics and a TMQ consumer subscribed to the raw collectd supertables. taosd consistently crashes with a null pointer dereference after 30-60 seconds of TMQ polling.

The crash is in addTagPseudoColumnData — it gets a NULL pointer for a subtable's tag data and passes it straight to memmove via doCopyNItems.

addTagPseudoColumnData looks up the tag value (like host) for a subtable that a stream is concurrently writing to, gets back a NULL pointer because the metadata isn't fully initialized yet, and passes it to memmove which dereferences it.

lldb:

* thread #91, name = 'vnode-query', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: libsystem_platform.dylib`_platform_memmove + 168
    frame #1: taosd`doCopyNItems + 260
    frame #2: taosd`addTagPseudoColumnData + 1280
    frame #3: taosd`doQueueScanNext + 1204
    frame #4: taosd`getNextBlockFromDownstreamImpl + 312
    frame #5: taosd`getNextBlockFromDownstream + 32
    frame #6: taosd`doProjectOperation + 260
    frame #7: taosd`qExecTask + 576
    frame #8: taosd`getDataBlock + 132
    frame #9: taosd`tqScanData + 164
    frame #10: taosd`tqExtractDataForMq + 688
    frame #11: taosd`tqProcessPollReq + 984
    frame #12: taosd`vnodeProcessQueryMsg + 344
    frame #13: taosd`vmProcessQueryQueue + 216
    frame #14: taosd`tQueryAutoQWorkerThreadFp + 632

At the time of the crash, a snode-stream-runner thread was actively running a multi-table JOIN stream (core_stream) that writes to an output supertable in the same vgroup. That thread's backtrace shows 5 levels of mInnerJoinDomJoinMainProcessdoProjectOperationstreamExecuteTask. So we have concurrent stream writes and TMQ reads touching the same vgroup's subtable metadata.

Our setup:

  • 6 raw collectd supertables (cpuavg_value, cpumax_value, memory_value, disk_value, if_0, if_1)
  • 10 streams: 2 INTERVAL(1s) rate streams, 6 PERIOD(1m) minute aggregation streams, 1 INTERVAL(1m) JOIN stream, 1 PERIOD(1h) hourly stream
  • 1 TMQ consumer subscribed to the 6 raw supertables plus 2 stream output tables
  • Single collectd host (so only 1 subtable per supertable)
  • macOS 15, Apple M3, TDengine 3.4.0.2 Community

The crash is 100% reproducible. taosd starts fine, streams run fine, TMQ connects and polls data successfully for 30-60 seconds, then hits this SEGV. Without TMQ consumers connected, taosd runs indefinitely with no issues.

We haven't been able to test whether this also happens on Linux — our production setup uses the same stream + TMQ configuration on Ubuntu aarch64 and x86_64 without this crash, but the timing may just be different.

Environment:

  • TDengine: 3.4.0.2 Community Edition
  • OS: macOS 15 (Sequoia), Apple M3 (ARM64)
  • Single-node deployment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions