Skip to content

Commit 85087bb

Browse files
authored
Fix use-after-free triggered by fast actor reaping when the cycle detector is active (#4616)
The logic in the actor run and the cycle detector work together to enable fast reaping of actors with rc == 0. This relies on atomics and protecting the relevant areas of logic with critical sections. This logic unfortunately suffered from a use-after-free bug due to a race between the cycle detector receiving the block message and destroying the actor and the actor cycle detector critical flag being release as identified in #4614 which could sometimes lead to memory corruption. This commit changes things to remove the need to protect the logic with critical sections. It achieves this by ensuring that an actor with rc == 0 that the cycle detector knows about will never be rescheduled again even if the cycle detector happens to send it a message and the cycle detector is free to reap the actor when it receives the block message. The cycle detector ensures that the actor's message queue is empty or that the only messages pending are the expected ones from the cycle detector so it can safely destroy the actor. Resolves #4614
1 parent b5b9520 commit 85087bb

File tree

4 files changed

+52
-107
lines changed

4 files changed

+52
-107
lines changed

.release-notes/4616.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
## Fix use-after-free triggered by fast actor reaping when the cycle detector is active
2+
3+
The logic in the actor run and the cycle detector work together to enable fast reaping of actors with rc == 0. This relies on atomics and protecting the relevant areas of logic with critical sections. This logic unfortunately suffered from a use-after-free bug due to a race between the cycle detector receiving the block message and destroying the actor and the actor cycle detector critical flag being release as identified in #4614 which could sometimes lead to memory corruption.
4+
5+
This commit changes things to remove the need to protect the logic with critical sections. It achieves this by ensuring that an actor with rc == 0 that the cycle detector knows about will never be rescheduled again even if the cycle detector happens to send it a message and the cycle detector is free to reap the actor when it receives the block message. The cycle detector ensures that the actor's message queue is empty or that the only messages pending are the expected ones from the cycle detector so it can safely destroy the actor.

src/libponyrt/actor/actor.c

Lines changed: 24 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -712,71 +712,37 @@ bool ponyint_actor_run(pony_ctx_t* ctx, pony_actor_t* actor, bool polling)
712712
return !empty;
713713
}
714714
} else {
715-
// The cycle detector is running so we have to ensure that it doesn't
716-
// send a message to the actor while we're in the process of telling
717-
// it that it is safe to destroy the actor.
715+
// Tell cycle detector that this actor is a zombie and will not get
716+
// any more messages/work and can be reaped.
717+
// Mark the actor as FLAG_BLOCKED_SENT and send a BLOCKED message
718+
// to speed up reaping otherwise waiting for the cycle detector
719+
// to get around to asking if we're blocked could result in
720+
// unnecessary memory growth.
718721
//
719-
// Before we check if our queue is empty, we need to obtain the
720-
// "critical delete" atomic for this actor. The cycle detector will
721-
// bail from sending any messages if it can't obtain the atomic.
722-
// Similarly, if the actor can't obtain the atomic here, then we do not
723-
// attempt any "I can be destroyed" operations as the cycle detector is
724-
// in the process of sending us a message.
725-
if (ponyint_acquire_cycle_detector_critical(actor))
726-
{
727-
if(ponyint_messageq_isempty(&actor->q))
728-
{
729-
// At this point the actors queue is empty and the cycle detector
730-
// will not send it any more messages because we "own" the barrier
731-
// for sending cycle detector messages to this actor.
732-
ponyint_actor_setpendingdestroy(actor);
733-
734-
// Tell cycle detector that this actor is a zombie and will not get
735-
// any more messages/work and can be reaped.
736-
// Mark the actor as FLAG_BLOCKED_SENT and send a BLOCKED message
737-
// to speed up reaping otherwise waiting for the cycle detector
738-
// to get around to asking if we're blocked could result in
739-
// unnecessary memory growth.
740-
//
741-
// We're blocked, send block message telling the cycle detector
742-
// to reap this actor (because its `rc == 0`).
743-
// This is concurrency safe because, only the cycle detector might
744-
// have a reference to this actor (rc is 0) so another actor can not
745-
// send it an application message that results this actor becoming
746-
// unblocked (which would create a race condition) and we've also
747-
// ensured that the cycle detector will not send this actor any more
748-
// messages (which would also create a race condition).
749-
send_block(ctx, actor);
750-
751-
// mark the queue as empty or else destroy will hang
752-
bool empty = ponyint_messageq_markempty(&actor->q);
753-
754-
// make sure the queue is actually empty as expected
755-
pony_assert(empty);
756-
757-
// "give up" critical section ownership
758-
ponyint_release_cycle_detector_critical(actor);
759-
760-
// make sure the scheduler will not reschedule this actor
761-
return !empty;
762-
} else {
763-
// "give up" critical section ownership
764-
ponyint_release_cycle_detector_critical(actor);
765-
}
766-
}
722+
// We're blocked, send block message telling the cycle detector
723+
// to reap this actor (because its `rc == 0`).
724+
// This is concurrency safe because, only the cycle detector might
725+
// have a reference to this actor (rc is 0) so another actor can not
726+
// send it an application message that results this actor becoming
727+
// unblocked (which would create a race condition).
728+
send_block(ctx, actor);
729+
730+
// We have to ensure that this actor will not be rescheduled if the
731+
// cycle detector happens to send it a message.
732+
//
733+
// intentionally don't mark the queue as empty because we don't want
734+
// this actor to be rescheduled if the cycle detector sends it a message
735+
//
736+
// make sure the scheduler will not reschedule this actor
737+
return false;
767738
}
768739
} else {
769740
// gc is greater than 0
770741
if (!actor_noblock && !has_internal_flag(actor, FLAG_CD_CONTACTED))
771742
{
772743
// The cycle detector is running and we've never contacted it ourselves,
773-
// so let's it know we exist in case it is unaware.
774-
if (ponyint_acquire_cycle_detector_critical(actor))
775-
{
776-
send_block(ctx, actor);
777-
// "give up" critical section ownership
778-
ponyint_release_cycle_detector_critical(actor);
779-
}
744+
// so let it know we exist in case it is unaware.
745+
send_block(ctx, actor);
780746
}
781747
}
782748
}
@@ -1328,14 +1294,3 @@ size_t ponyint_actor_total_alloc_size(pony_actor_t* actor)
13281294
+ POOL_ALLOC_SIZE(pony_msg_t);
13291295
}
13301296
#endif
1331-
1332-
bool ponyint_acquire_cycle_detector_critical(pony_actor_t* actor)
1333-
{
1334-
uint8_t expected = 0;
1335-
return atomic_compare_exchange_strong_explicit(&actor->cycle_detector_critical, &expected, 1, memory_order_acq_rel, memory_order_acquire);
1336-
}
1337-
1338-
void ponyint_release_cycle_detector_critical(pony_actor_t* actor)
1339-
{
1340-
atomic_store_explicit(&actor->cycle_detector_critical, 0, memory_order_release);
1341-
}

src/libponyrt/actor/actor.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,6 @@ typedef struct pony_actor_t
5656
messageq_t q;
5757
// sync flags are access from multiple scheduler threads concurrently
5858
PONY_ATOMIC(uint8_t) sync_flags;
59-
// accessed from the cycle detector and the actor
60-
PONY_ATOMIC(uint8_t) cycle_detector_critical;
6159

6260
// keep things accessed by other actors on a separate cache line
6361
alignas(64) heap_t heap; // 52/104 bytes

src/libponyrt/gc/cycle.c

Lines changed: 23 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -640,23 +640,7 @@ static void send_conf(pony_ctx_t* ctx, perceived_t* per)
640640

641641
while((view = ponyint_viewmap_next(&per->map, &i)) != NULL)
642642
{
643-
// The actor itself is also allowed to initiate a deletion of itself if
644-
// it is blocked and has a reference count of 0. Because deletion is
645-
// not an atomic operation, and the message queue remaining empty is an
646-
// invariant that can't be violated if the cycle detector is sending
647-
// a message to an actor, that actor isn't eligible to initiate getting
648-
// reaped in a short-circuit fashion. Only the actor or the cycle
649-
// detector can be in the "cycle detector critical" section for the actor.
650-
if (ponyint_acquire_cycle_detector_critical(view->actor))
651-
{
652-
// To avoid a race condition, we have to make sure that the actor isn't
653-
// pending destroy before sending a message. Failure to do so could
654-
// eventually result in a double free.
655-
if(!ponyint_actor_pendingdestroy(view->actor))
656-
pony_sendi(ctx, view->actor, ACTORMSG_CONF, per->token);
657-
658-
ponyint_release_cycle_detector_critical(view->actor);
659-
}
643+
pony_sendi(ctx, view->actor, ACTORMSG_CONF, per->token);
660644
}
661645
}
662646

@@ -808,23 +792,7 @@ static void check_blocked(pony_ctx_t* ctx, detector_t* d)
808792
// if it is not already blocked
809793
if(!view->blocked)
810794
{
811-
// The actor itself is also allowed to initiate a deletion of itself if
812-
// it is blocked and has a reference count of 0. Because deletion is
813-
// not an atomic operation, and the message queue remaining empty is an
814-
// invariant that can't be violated if the cycle detector is sending
815-
// a message to an actor, that actor isn't eligible to initiate getting
816-
// reaped in a short-circuit fashion. Only the actor or the cycle
817-
// detector can be in the "cycle detector critical" section for the actor.
818-
if (ponyint_acquire_cycle_detector_critical(view->actor))
819-
{
820-
// To avoid a race condition, we have to make sure that the actor isn't
821-
// pending destroy before sending a message. Failure to do so could
822-
// eventually result in a double free.
823-
if(!ponyint_actor_pendingdestroy(view->actor))
824-
pony_send(ctx, view->actor, ACTORMSG_ISBLOCKED);
825-
826-
ponyint_release_cycle_detector_critical(view->actor);
827-
}
795+
pony_send(ctx, view->actor, ACTORMSG_ISBLOCKED);
828796
}
829797

830798
// Stop if we've hit the max limit for # of actors to check
@@ -902,10 +870,29 @@ static void block(detector_t* d, pony_ctx_t* ctx, pony_actor_t* actor,
902870
ponyint_deltamap_free(map);
903871
}
904872

905-
// the actor should already be marked as pending destroy
906-
pony_assert(ponyint_actor_pendingdestroy(actor));
873+
// the actor should not already be marked as pending destroy
874+
pony_assert(!ponyint_actor_pendingdestroy(actor));
875+
876+
pony_msg_t* msg;
877+
878+
// the actor's queue wasn't marked empty because we didn't want the actor
879+
// to be rescheduled if the cycle detector sent it a message
880+
// make sure the actor's queue is empty and the only pending messages are
881+
// the expected ones from the cycle detector
882+
while(!ponyint_messageq_markempty(&actor->q))
883+
{
884+
while((msg = ponyint_actor_messageq_pop(&actor->q
885+
#ifdef USE_DYNAMIC_TRACE
886+
, ctx->scheduler, actor
887+
#endif
888+
)) != NULL)
889+
{
890+
pony_assert((msg->id == ACTORMSG_CONF) || (msg->id == ACTORMSG_ISBLOCKED));
891+
}
892+
}
907893

908894
// invoke the actor's finalizer and destroy it
895+
ponyint_actor_setpendingdestroy(actor);
909896
ponyint_actor_final(ctx, actor);
910897
ponyint_actor_sendrelease(ctx, actor);
911898
ponyint_actor_destroy(actor);

0 commit comments

Comments
 (0)