Skip to content

Commit bca2d41

Browse files
author
Dmitrii Kuvaiskii
committed
[PAL/Linux-SGX] Inject signals into enclave in regular context
Previously, if the enclave was interrupted by a sync signal (e.g., SIGILL) or an async signal (e.g., SIGTERM), then the untrusted-runtime signal handler injected the signal directly into the enclave. In particular, untrusted runtime ran in the signal-handling context and called `sgx_raise()` that would perform EENTER to enter the in-enclave stage-1 signal handler, then EEXIT to exit the enclave back into the untrusted-runtime signal-handling context, and then untrusted runtime would perform sigreturn to go back to untrusted-runtime regular context, jumping to AEP (Asynchronous Exit Pointer). In AEP, ERESUME was called to resume the enclave execution from the stage-2 signal handler. In other words, the following invariants held: - In-enclave stage-1 signal handler (in SSA 1) always executed in the signal-handling context of the untrusted runtime. - In-enclave stage-2 signal handler (in SSA 0) always executed in regular context of the untrusted runtime. As a preparation for AEX Notify support, this commit breaks the above strong coupling of contexts: in-enclave stage-1 signal handler must execute in regular context of the untrusted runtime. In particular, this commit changes signal-handling logic as follows: instead of immediately delivering a sync/async signal into the enclave, the untrusted runtime's signal handler memorizes the signal in a thread-local variable `last_sync_signal`/`last_async_signal` and returns. When host kernel returns back to regular context from the signal handler, it jumps to the AEP, which is augmented with a new logic: checking whether there is any signal pending (variables `last_sync_signal` or `last_async_signal` are not zero). If there is a pending signal, the new AEP logic performs EENTER, so that in-enclave stage-1 handler executes. After the stage-1 handler is done, it performs EEXIT, and the AEP logic finalizes with ERESUME as usual. At this point the flow is the same as was previously implemented: the enclave is resumed in the in-enclave stage-2 handler. There is one corner case: an async signal can arrive while the enclave is executing the stage-1 handler (in SSA 1). In this case, an async signal flow is triggered in untrusted runtime, and the AEP after the async signal will try to EENTER, but since there's already SSA 1 executing inside the enclave and SSA 2 is forbidden by SGX hardware, this (nested) EENTER will raise a #GP fault which translates into SIGSEGV and is delivered to the untrusted runtime's signal handler. We augment the SIGSEGV (aka PAL_EVENT_MEMFAULT) signal handler to catch this particular case and ignore it: the async signal is re-memorized in `last_async_signal` variable but cannot be delivered right now. This async signal will be delivered on some later AEX event. Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
1 parent 0e7cac7 commit bca2d41

File tree

2 files changed

+116
-15
lines changed

2 files changed

+116
-15
lines changed

pal/src/host/linux-sgx/host_exception.c

+114-14
Original file line numberDiff line numberDiff line change
@@ -93,25 +93,76 @@ static bool interrupted_in_aex(void) {
9393
}
9494

9595
static void handle_sync_signal(int signum, siginfo_t* info, struct ucontext* uc) {
96-
enum pal_event event = signal_to_pal_event(signum);
97-
9896
__UNUSED(info);
9997

98+
enum pal_event event = signal_to_pal_event(signum);
99+
uint64_t rip = ucontext_get_ip(uc);
100+
100101
/* send dummy signal to RPC threads so they interrupt blocked syscalls */
101102
if (g_rpc_queue)
102103
for (size_t i = 0; i < g_rpc_queue->rpc_threads_cnt; i++)
103104
DO_SYSCALL(tkill, g_rpc_queue->rpc_threads[i], SIGUSR2);
104105

106+
if (event == PAL_EVENT_MEMFAULT && interrupted_in_aex() && rip == (uint64_t)&eenter_pointer) {
107+
/*
108+
* This is a #GP on EENTER instruction inside sgx_raise(), called during AEX handling by
109+
* maybe_raise_pending_signal(). This implies that some async signal arrived and was
110+
* injected by AEX logic while the enclave thread is being executed in CSSA=1 (stage-1
111+
* exception handler).
112+
*
113+
* We ignore this #GP fault by skipping EENTER. This newly arrived async signal will be
114+
* delivered at some later AEX event, when the enclave thread starts executing in CSSA=0.
115+
*
116+
* Since last_async_event was reset to NO_EVENT before sgx_raise(), we must restore it to
117+
* this failed-to-deliver async signal. We extract async signal number from RDI register.
118+
* See also maybe_raise_pending_signal().
119+
*/
120+
enum pal_event faulted_event = uc->uc_mcontext.rdi; /* convention, see .Lcssa1_exception */
121+
if (faulted_event != PAL_EVENT_INTERRUPTED && faulted_event != PAL_EVENT_QUIT) {
122+
log_error("#GP on EENTER instruction not because of async signal, impossible!");
123+
BUG();
124+
}
125+
if (pal_get_host_tcb()->last_async_event != PAL_EVENT_QUIT) {
126+
/* Do not overwrite `PAL_EVENT_QUIT`. For explanation, see handle_async_signal(). */
127+
pal_get_host_tcb()->last_async_event = faulted_event;
128+
}
129+
130+
ucontext_set_ip(uc, rip + /*sizeof(ENCLU)=*/3); /* skip EENTER */
131+
return;
132+
}
133+
105134
if (interrupted_in_enclave(uc)) {
106-
/* exception happened in app/LibOS/trusted PAL code, handle signal inside enclave */
135+
/*
136+
* Exception happened in app/LibOS/trusted PAL code, mark this sync signal as pending. This
137+
* singal will be delivered right after this untrusted-runtime signal handler returns
138+
* control to the AEX logic, which will call maybe_raise_pending_signal().
139+
*
140+
* We do not deliver the signal immediately to the enclave (but instead mark it as pending)
141+
* because we want to support AEX Notify hardware feature in SGX. In particular, AEX Notify
142+
* must execute in-enclave flows in regular context of the untrusted runtime, because AEX
143+
* Notify uses EDECCSSA instruction to go from CSSA=1 context to CSSA=0 context (i.e., AEX
144+
* Notify does not exit the SGX enclave and thus does not give an opportunity to the
145+
* untrusted runtime to switch from signal-handling context to regular context).
146+
*
147+
* Therefore, we must execute the in-enclave stage-1 signal handler in regular context of
148+
* the untrusted runtime. This is achieved by interposing on the AEX flow (which executes
149+
* right after the host kernel handles control from this signal handler back to regular
150+
* context).
151+
*
152+
* We don't need to use atomics when accessing last_sync_event since we are in the
153+
* signal-handling context, and thus no other signal can arrive while we're here.
154+
*/
155+
if (pal_get_host_tcb()->last_sync_event != PAL_EVENT_NO_EVENT) {
156+
log_error("Nested sync signal, impossible!");
157+
BUG();
158+
}
159+
pal_get_host_tcb()->last_sync_event = event;
160+
107161
pal_get_host_tcb()->sync_signal_cnt++;
108-
sgx_raise(event);
109162
return;
110163
}
111164

112165
/* exception happened in untrusted PAL code (during syscall handling): fatal in Gramine */
113-
114-
unsigned long rip = ucontext_get_ip(uc);
115166
char buf[LOCATION_BUF_SIZE];
116167
pal_describe_location(rip, buf, sizeof(buf));
117168

@@ -153,13 +204,11 @@ static void handle_async_signal(int signum, siginfo_t* info, struct ucontext* uc
153204
for (size_t i = 0; i < g_rpc_queue->rpc_threads_cnt; i++)
154205
DO_SYSCALL(tkill, g_rpc_queue->rpc_threads[i], SIGUSR2);
155206

156-
if (interrupted_in_enclave(uc) || interrupted_in_aex()) {
157-
/* signal arrived while in app/LibOS/trusted PAL code or when handling another AEX, handle
158-
* signal inside enclave */
207+
if (interrupted_in_enclave(uc))
159208
pal_get_host_tcb()->async_signal_cnt++;
160-
sgx_raise(event);
161-
return;
162-
}
209+
210+
/* see comments in handle_sync_signal() on why we do not deliver the signal immediately to the
211+
* enclave (but instead mark it as pending) */
163212

164213
assert(event == PAL_EVENT_INTERRUPTED || event == PAL_EVENT_QUIT);
165214
if (pal_get_host_tcb()->last_async_event != PAL_EVENT_QUIT) {
@@ -276,7 +325,7 @@ void pal_describe_location(uintptr_t addr, char* buf, size_t buf_size) {
276325
}
277326

278327
#ifdef DEBUG
279-
/* called on each AEX and OCALL (in normal context), see host_entry.S */
328+
/* called on each AEX and OCALL (in regular context), see host_entry.S */
280329
void maybe_dump_and_reset_stats(void) {
281330
if (!g_sgx_enable_stats)
282331
return;
@@ -288,6 +337,57 @@ void maybe_dump_and_reset_stats(void) {
288337
}
289338
#endif
290339

340+
/*
341+
* The handle_sync_signal() and handle_async_signal() functions, executed in signal-handling
342+
* context, added pending sync/async signal to the thread -- now the AEX flow, executed in regular
343+
* context, must inform the enclave about these signals.
344+
*
345+
* This function is executed as part of the AEX flow, and may result in EENTER -> in-enclave stage-1
346+
* signal handler -> EEXIT (if there is any pending signal, and enclave is not in the middle of
347+
* another stage-1 signal handler). When the function returns, the AEX flow continues and ends up in
348+
* ERESUME, that resumes "regular context" inside the enclave (which may be stage-2 signal handler).
349+
*
350+
* Only one of potentially two signals (one sync and one async) will be injected into the enclave at
351+
* a time by this function. The hope is that the second (async) signal will be added at some later
352+
* AEX event.
353+
*
354+
* Note that async signals are special in Gramine, there are only two of them: SIGCONT (aka
355+
* PAL_EVENT_INTERRUPTED) which is dummy (can be ignored) and SIGTERM (aka PAL_EVENT_QUIT) which is
356+
* injected only once anyway. Thus we don't need a queue of pending async signals, and a single slot
357+
* for a pending async signal is sufficient (which is the `pal_get_host_tcb()->last_async_event`
358+
* variable).
359+
*
360+
* Also note that new sync signals cannot occur while in this function, but new async signals can
361+
* occur (since we are in regular context and cannot block async signals), thus handling async
362+
* signals must be aware of concurrent signal handling code, i.e., last_async_event must be accessed
363+
* atomically. We also access last_sync_event atomically, just for uniformity (though it is not
364+
* strictly required).
365+
*/
291366
void maybe_raise_pending_signal(void) {
292-
/* TODO: check if there is any sync or async pending signal and raise it */
367+
enum pal_event event;
368+
369+
event = __atomic_exchange_n(&pal_get_host_tcb()->last_sync_event, PAL_EVENT_NO_EVENT,
370+
__ATOMIC_RELAXED);
371+
if (event != PAL_EVENT_NO_EVENT) {
372+
/*
373+
* Sync event must always be consumed by the enclave. There is no scenario where the
374+
* in-enclave stage-1 handling of another sync/async event would generate a sync event.
375+
*/
376+
sgx_raise(event);
377+
return;
378+
}
379+
380+
event = __atomic_exchange_n(&pal_get_host_tcb()->last_async_event, PAL_EVENT_NO_EVENT,
381+
__ATOMIC_RELAXED);
382+
if (event != PAL_EVENT_NO_EVENT) {
383+
/*
384+
* Async event may be *not* consumed by the enclave. This can happen if the enclave was
385+
* already in the middle of stage-1 handler and thus EENTER generates #GP (because this
386+
* EENTER would imply CSSA=2 which Gramine always programmes as prohibited in Intel SGX).
387+
* In such case, this async event is ignored and will be delivered on some later AEX.
388+
* See also handle_sync_signal().
389+
*/
390+
sgx_raise(event);
391+
return;
392+
}
293393
}

pal/src/host/linux-sgx/pal_tcb.h

+2-1
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,8 @@ typedef struct pal_host_tcb {
9999
atomic_ulong sync_signal_cnt; /* # of sync signals, corresponds to # of SIGSEGV/SIGILL/.. */
100100
atomic_ulong async_signal_cnt; /* # of async signals, corresponds to # of SIGINT/SIGCONT/.. */
101101
uint64_t profile_sample_time; /* last time sgx_profile_sample() recorded a sample */
102-
int32_t last_async_event; /* last async signal, reported to the enclave on ocall return */
102+
int32_t last_async_event; /* last async signal, reported to enclave on ocall return/AEX */
103+
int32_t last_sync_event; /* last sync signal, reported to enclave on ocall return/AEX */
103104
int* start_status_ptr; /* pointer to return value of clone_thread */
104105
bool reset_stats; /* if true, dump SGX stats and reset them on next AEX/OCALL */
105106
} PAL_HOST_TCB;

0 commit comments

Comments
 (0)