Skip to content

Commit 56310a1

Browse files
committed
[WIP] [LibOS] Add support for timerfd system calls
This commit adds support for system calls that create and operate on a timer that delivers timer expiration notifications via a file descriptor, specifically: `timerfd_create()`, `timerfd_settime()` and `timerfd_gettime()`. The timerfd object is associated with a dummy eventfd created on the host to trigger notifications (e.g., in epoll). The object is created inside Gramine, with all it operations resolved entirely inside Gramine. The emulation is currently implemented at the level of a single process. However, it may sometimes work for multi-process applications, e.g., if the child process inherits the timerfd object but doesn't use it; to support these cases, we introduce the `sys.experimental__allow_timerfd_fork` manifest option. LibOS regression tests are also added. Signed-off-by: Kailun Qin <[email protected]>
1 parent a933017 commit 56310a1

33 files changed

+1008
-46
lines changed

Documentation/devel/features.md

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1036,7 +1036,7 @@ The below list is generated from the [syscall table of Linux
10361036
-`signalfd()`
10371037
<sup>[7](#signals-and-process-state-changes)</sup>
10381038

1039-
- `timerfd_create()`
1039+
- `timerfd_create()`
10401040
<sup>[20](#sleeps-timers-and-alarms)</sup>
10411041

10421042
-`eventfd()`
@@ -1045,10 +1045,10 @@ The below list is generated from the [syscall table of Linux
10451045
-`fallocate()`
10461046
<sup>[9a](#file-system-operations)</sup>
10471047

1048-
- `timerfd_settime()`
1048+
- `timerfd_settime()`
10491049
<sup>[20](#sleeps-timers-and-alarms)</sup>
10501050

1051-
- `timerfd_gettime()`
1051+
- `timerfd_gettime()`
10521052
<sup>[20](#sleeps-timers-and-alarms)</sup>
10531053

10541054
-`accept4()`
@@ -2862,9 +2862,21 @@ Gramine implements getting and setting the interval timer: `getitimer()` and `se
28622862

28632863
Gramine implements alarm clocks via `alarm()`.
28642864

2865+
Gramine implements timers that notify via file descriptors: `timerfd_create()`, `timerfd_settime()`
2866+
and `timerfd_gettime()`. The timerfd object is created inside Gramine, and all operations are
2867+
resolved entirely inside Gramine. Each timerfd object is associated with a dummy eventfd created on
2868+
the host. This is purely for triggering read/write notifications (e.g., in epoll); timerfd data is
2869+
verified inside Gramine and is never exposed to the host. Since the host is used purely for
2870+
notifications, a malicious host can only induce Denial of Service (DoS) attacks.
2871+
2872+
The emulation is currently implemented at the level of a single process. The emulation may work for
2873+
multi-process applications, e.g., if the child process inherits the timerfd object but doesn't use
2874+
it. However, multi-process support is brittle and thus disabled by default (Gramine will issue a
2875+
warning). To enable it still, set the [`sys.experimental__allow_timerfd_fork` manifest
2876+
option](../manifest-syntax.html#allowing-timerfd-in-multi-process-applications).
2877+
28652878
Gramine does *not* currently implement the POSIX per-process timer: `timer_create()`, etc. Gramine
2866-
also does not currently implement timers that notify via file descriptors. Gramine could implement
2867-
these timers in the future, if need arises.
2879+
could implement it in the future, if need arises.
28682880

28692881
<details><summary>Related system calls</summary>
28702882

@@ -2880,9 +2892,9 @@ these timers in the future, if need arises.
28802892
-`timer_getoverrun()`: may be implemented in the future
28812893
-`timer_delete()`: may be implemented in the future
28822894

2883-
- `timerfd_create()`: may be implemented in the future
2884-
- `timerfd_settime()`: may be implemented in the future
2885-
- `timerfd_gettime()`: may be implemented in the future
2895+
- `timerfd_create()`: see notes above
2896+
- `timerfd_settime()`: see notes above
2897+
- `timerfd_gettime()`: see notes above
28862898

28872899
</details><br />
28882900

Documentation/manifest-syntax.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -364,6 +364,22 @@ Python). Could be useful in SGX environments: child processes consume
364364
to achieve this, you need to run the whole Gramine inside a proper security
365365
sandbox.
366366
367+
.. _timerfd-in-multi-process:
368+
369+
Allowing timerfd in multi-process applications
370+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
371+
372+
::
373+
374+
sys.experimental__allow_timerfd_fork = [true|false]
375+
(Default: false)
376+
377+
Gramine implements timerfd in a secure way, but this implementation works only
378+
in single-process applications. If you have a multi-process application and you
379+
are sure that the parent process and its child processes do not have
380+
cross-process usage of timerfd, you can use
381+
``sys.experimental__allow_timerfd_fork`` manifest syntax.
382+
367383
Root FS mount point
368384
^^^^^^^^^^^^^^^^^^^
369385

libos/include/libos_fs.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,10 @@ struct libos_fs_ops {
182182
/* Poll a single handle. Must not block. */
183183
int (*poll)(struct libos_handle* hdl, int in_events, int* out_events);
184184

185+
/* Verify a single handle after poll. Must update `pal_ret_events` in-place with only allowed
186+
* ones. Used in e.g. secure timerfd FS. */
187+
void (*post_poll)(struct libos_handle* hdl, pal_wait_flags_t* pal_ret_events);
188+
185189
/* checkpoint/migrate the file system */
186190
ssize_t (*checkpoint)(void** checkpoint, void* mount_data);
187191
int (*migrate)(void* checkpoint, void** mount_data);
@@ -930,6 +934,7 @@ extern struct libos_fs eventfd_builtin_fs;
930934
extern struct libos_fs synthetic_builtin_fs;
931935
extern struct libos_fs path_builtin_fs;
932936
extern struct libos_fs shm_builtin_fs;
937+
extern struct libos_fs timerfd_builtin_fs;
933938

934939
struct libos_fs* find_fs(const char* name);
935940

libos/include/libos_handle.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ enum libos_handle_type {
4646
/* Special handles: */
4747
TYPE_EPOLL, /* epoll handles, see `libos_epoll.c` */
4848
TYPE_EVENTFD, /* eventfd handles, used by `eventfd` filesystem */
49+
TYPE_TIMERFD, /* timerfd handles, used by `timerfd` filesystem */
4950
};
5051

5152
struct libos_pipe_handle {
@@ -134,6 +135,16 @@ struct libos_epoll_handle {
134135
size_t last_returned_index;
135136
};
136137

138+
struct libos_timerfd_handle {
139+
spinlock_t expiration_lock; /* protecting below fields */
140+
uint64_t num_expirations;
141+
uint64_t dummy_host_val;
142+
143+
spinlock_t timer_lock;
144+
uint64_t timeout;
145+
uint64_t reset;
146+
};
147+
137148
struct libos_handle {
138149
enum libos_handle_type type;
139150
bool is_dir;
@@ -204,6 +215,7 @@ struct libos_handle {
204215

205216
struct libos_epoll_handle epoll; /* TYPE_EPOLL */
206217
struct { bool is_semaphore; } eventfd; /* TYPE_EVENTFD */
218+
struct libos_timerfd_handle timerfd; /* TYPE_TIMERFD */
207219
} info;
208220

209221
struct libos_dir_handle dir_info;

libos/include/libos_internal.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -262,3 +262,7 @@ int init_stack(const char* const* argv, const char* const* envp, char*** out_arg
262262
* The implementation of this function depends on the used architecture.
263263
*/
264264
noreturn void call_elf_entry(elf_addr_t entry, void* argp);
265+
266+
extern bool g_timerfd_allow_fork;
267+
extern uint32_t g_timerfd_cnt;
268+
int init_timerfd(void);

libos/include/libos_table.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,3 +206,7 @@ long libos_syscall_getcpu(unsigned* cpu, unsigned* node, void* unused_cache);
206206
long libos_syscall_getrandom(char* buf, size_t count, unsigned int flags);
207207
long libos_syscall_mlock2(unsigned long start, size_t len, int flags);
208208
long libos_syscall_sysinfo(struct sysinfo* info);
209+
long libos_syscall_timerfd_create(int clockid, int flags);
210+
long libos_syscall_timerfd_settime(int fd, int flags, const struct __kernel_itimerspec* value,
211+
struct __kernel_itimerspec* ovalue);
212+
long libos_syscall_timerfd_gettime(int fd, struct __kernel_itimerspec* value);

libos/include/libos_utils.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ int create_pipe(char* name, char* uri, size_t size, PAL_HANDLE* hdl, bool use_vm
5353

5454
/* Asynchronous event support */
5555
int init_async_worker(void);
56-
int64_t install_async_event(PAL_HANDLE object, unsigned long time,
56+
int64_t install_async_event(PAL_HANDLE object, unsigned long time, bool absolute_time,
5757
void (*callback)(IDTYPE caller, void* arg), void* arg);
5858
struct libos_thread* terminate_async_worker(void);
5959

libos/include/linux_abi/timerfd.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
/* SPDX-License-Identifier: LGPL-3.0-or-later */
2+
/* Copyright (C) 2024 Intel Corporation
3+
* Kailun Qin <[email protected]>
4+
*/
5+
6+
#pragma once
7+
8+
/* Types and structures used by various Linux ABIs (e.g. syscalls). */
9+
/* These need to be binary-identical with the ones used by Linux. */
10+
11+
#include <linux/timerfd.h>
12+
13+
#define TFD_SHARED_FCNTL_FLAGS (TFD_CLOEXEC | TFD_NONBLOCK)
14+
/* Flags for timerfd_create. */
15+
#define TFD_CREATE_FLAGS TFD_SHARED_FCNTL_FLAGS
16+
/* Flags for timerfd_settime. */
17+
#define TFD_SETTIME_FLAGS (TFD_TIMER_ABSTIME | TFD_TIMER_CANCEL_ON_SET)

libos/src/arch/x86_64/libos_table.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -297,11 +297,11 @@ libos_syscall_t libos_syscall_table[LIBOS_SYSCALL_BOUND] = {
297297
[__NR_utimensat] = (libos_syscall_t)0, // libos_syscall_utimensat
298298
[__NR_epoll_pwait] = (libos_syscall_t)libos_syscall_epoll_pwait,
299299
[__NR_signalfd] = (libos_syscall_t)0, // libos_syscall_signalfd
300-
[__NR_timerfd_create] = (libos_syscall_t)0, // libos_syscall_timerfd_create
300+
[__NR_timerfd_create] = (libos_syscall_t)libos_syscall_timerfd_create,
301301
[__NR_eventfd] = (libos_syscall_t)libos_syscall_eventfd,
302302
[__NR_fallocate] = (libos_syscall_t)libos_syscall_fallocate,
303-
[__NR_timerfd_settime] = (libos_syscall_t)0, // libos_syscall_timerfd_settime
304-
[__NR_timerfd_gettime] = (libos_syscall_t)0, // libos_syscall_timerfd_gettime
303+
[__NR_timerfd_settime] = (libos_syscall_t)libos_syscall_timerfd_settime,
304+
[__NR_timerfd_gettime] = (libos_syscall_t)libos_syscall_timerfd_gettime,
305305
[__NR_accept4] = (libos_syscall_t)libos_syscall_accept4,
306306
[__NR_signalfd4] = (libos_syscall_t)0, // libos_syscall_signalfd4
307307
[__NR_eventfd2] = (libos_syscall_t)libos_syscall_eventfd2,

libos/src/fs/libos_fs.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ static struct libos_fs* g_builtin_fs[] = {
3333
&synthetic_builtin_fs,
3434
&path_builtin_fs,
3535
&shm_builtin_fs,
36+
&timerfd_builtin_fs,
3637
};
3738

3839
static struct libos_lock g_mount_mgr_lock;

0 commit comments

Comments
 (0)