Skip to content

dialog: fix use-after-free and race in cluster replication#3860

Open
NormB wants to merge 1 commit intoOpenSIPS:masterfrom
NormB:fix/dialog-cluster-sipi-crash
Open

dialog: fix use-after-free and race in cluster replication#3860
NormB wants to merge 1 commit intoOpenSIPS:masterfrom
NormB:fix/dialog-cluster-sipi-crash

Conversation

@NormB
Copy link
Copy Markdown
Member

@NormB NormB commented Mar 31, 2026

Summary

Fix three bugs triggered when SIP-I messages with binary ISUP data are replicated across a dialog cluster with reinvite pinging enabled.

  • use-after-free in dlg_replicated_create: after _link_dlg_unsafe() links the dialog into the hash table, DLG_BIN_POP failures jumped to pre_linking_error which calls destroy_dlg() without unlinking — leaves a dangling pointer in the hash chain
  • TOCTOU race in write_dialog_vars: read lock released between the sizing pass and the write pass, allowing concurrent store_dlg_value() to corrupt the buffer
  • OOB read in strip_esc: *(c+1) read past string end when last byte is backslash

Reproduction

2-node cluster, 16 workers, reinvite_ping_interval=5, 300 CPS with multipart SDP+ISUP bodies and concurrent re-INVITEs. Unpatched: SIGSEGV in free_dlg_dlg()shm_free(0xabcdefedabcdefed) (freed-memory poison). Patched: same load, zero crashes.

Closes #3858

Fix three bugs triggered when SIP-I messages with binary ISUP data
are replicated across a dialog cluster with reinvite pinging enabled.

1. dlg_replicated_create: after _link_dlg_unsafe() links the dialog
   into the hash table, subsequent DLG_BIN_POP failures jumped to
   pre_linking_error which calls destroy_dlg() without unlinking.
   This leaves a dangling pointer in the hash chain — other workers
   dereference freed memory (GPF). Add post_linking_error label that
   calls unlink_unsafe_dlg() before destroy.

2. write_dialog_vars: the read lock on vals_lock was released between
   the sizing pass and the write pass. A concurrent store_dlg_value()
   (e.g. from persist_reinvite_pinging storing multipart SDP+ISUP
   bodies) can modify the vals list in between, causing a buffer
   overflow and corrupted serialization. Hold the read lock through
   both passes.

3. strip_esc: when len==1 and *c is backslash, *(c+1) reads one byte
   past the string. Add len>1 guard.

Closes OpenSIPS#3858
@NormB NormB requested a review from liviuchircu March 31, 2026 17:39
@NormB NormB marked this pull request as ready for review April 1, 2026 11:33
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

Any updates here? No progress has been made in the last 30 days, marking as stale.

@github-actions github-actions Bot added the stale label May 2, 2026
@NormB
Copy link
Copy Markdown
Member Author

NormB commented May 2, 2026

bump

@stale stale Bot removed the stale label May 2, 2026
@bogdan-iancu bogdan-iancu self-assigned this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CRASH]Clusterer + SIP-I + Dialog (sharing)

2 participants