[LibOS,common] Add file recovery support for encrypted files #2082

kailun-qin · 2025-01-07T06:38:26Z

Description of the changes

Previously, a fatal error during writes to encrypted files could cause file corruption due to incorrect GMACs and/or encryption keys.

To address this, we introduce a file recovery mechanism using a "shadow" recovery file that stores data about to change and a has_pending_write flag in the metadata node indicating the start of a write transaction. During file flush, all cached blocks that are about to change are saved to the recovery file in the format of physical node numbers (offsets) plus encrypted block data. Before saving the main file contents, the has_pending_write flag is set in the file's metadata node and cleared only when the transaction is complete. If an encrypted file is opened and the has_pending_write flag is set, a recovery process starts to revert partial changes using the recovery file, returning to the last known good state. The "shadow" recovery file is cleaned up on file close.

This commit adds a new mount parameter enable_recovery = [true|false] for encrypted files mounts to optionally enable this feature. We extend the file flush logic of protected files (pf) to include the recovery file dump and the setting/unsetting of the has_pending_write flag. We also extend pf_open() to make the pf aware of the underlying recovery file managed by LibOS, and to include an optional recovery check and initiate recovery if needed.

Fixes #2013.

How to test this PR?

CI + manual testing.

This change is

kailun-qin · 2025-01-07T07:33:31Z

Jenkins, retest Jenkins-Direct-24.04-Debug please (fdatasync01 from LTP timed out, known and unrelated to the PR)

ynonflumintel · 2025-01-09T17:54:25Z

libos/include/libos_fs.h

+
+    /* Whether to enable file recovery (used by `chroot_encrypted` filesystem), false if not
+     * applicable */
+    bool enable_recovery;


what is the behavior if recovery is enabled, disabled and re-enabled?
do you remove the old shadow files when mounting?

In the first "recovery enabled" run, if the app terminates abruptly, a shadow file will be generated. If recovery is "disabled" in the next run, the shadow file will remain and will not be accessed. When recovery is "re-enabled" in a subsequent run, the recovery file will not be removed upon mounting. However, it will be overwritten during flush() and removed upon closing.

I think you can't assume that you can replay the shadow file once the flag is turned off, if new data is written to the same offset in the file when the flag is off and then you re-enable it - the file won't be consistent.

Hmm... Good point. Do we consider this a legitimate usage? If so, alternatively, we could use enable_recovery to control whether a backup is needed on flush, but still perform the file recovery process as long as the update_flag is set (i.e., we restore to the last known good state even during a "disabled" run if a previous run was abruptly terminated).

I think a better approach is either not allowing non-recoverable mounts after mounting once with recovery, or removing the shadow files in the above scenario since the user has (hopefully knowingly) disabled recovery.

Thanks for the input! I go w/ the first approach -- not allowing non-recoverable mounts if a recovery is needed.

FYI: I dismissed @ynonflumintel block here, as they are not a maintainer and are not responding.

ynonflumintel · 2025-01-09T17:56:10Z

libos/include/libos_fs_encrypted.h

 *
 * `uri` must not correspond to an existing file.
 *
 * The newly created `libos_encrypted_file` object will have `use_count` set to 1.
 */
 int encrypted_file_create(const char* uri, mode_t perm, struct libos_encrypted_files_key* key,
-                          struct libos_encrypted_file** out_enc);
+                          bool enable_recovery, struct libos_encrypted_file** out_enc);


you may want to have some const fs_configuration struct that's passed around, assuming more flags are going to be introduced in the future

Yeah, I'm okay with having a fs_configuration struct, but currently, we only have this enable_recovery option.

I left it as is for now.

FYI: I dismissed @ynonflumintel block here, as they are not a maintainer and are not responding.

ynonflumintel · 2025-01-09T17:59:42Z

pal/src/host/linux-common/file_utils.c

+    }
+
+    for (size_t i = 0; i < nodes_count; i++) {
+        ret = read_all(recovery_file_fd, recovery_node, recovery_node_size);


Is there an upper bound for the amount of data written in a shadow file?
do you attempt to allocate memory for and read from the full shadow file here?

Is there an upper bound for the amount of data written in a shadow file?

The upper bound for the data that can be written to a shadow file should be the same as on a typical Linux system. This is determined by e.g., the fs type, available disk space, and the maximum file size supported by the fs.

do you attempt to allocate memory for and read from the full shadow file here?

Sorry, I don't understand this concern. Pls note that this piece of code is on the untrusted side of Gramine.

If I understand correctly, there's a malloc call which might attempt to allocate TBs if the size is not limited

Ah, I understand your concern now. I was referring to the theoretical upper bound earlier. Actually, during each flush, only the data that is about to change in the encrypted files cache (which has a default size of 192KB as specified here) will be saved and rewritten to the recovery file.

FYI: I dismissed @ynonflumintel block here, as they are not a maintainer and are not responding.

efu39

Reviewed 26 of 27 files at r1, all commit messages.
Reviewable status: 26 of 27 files reviewed, 8 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @ynonflumintel)

libos/src/fs/libos_fs_encrypted.c line 277 at r1 (raw file):

            if (recovery_needed) {
                log_warning("file recovery needed but failed");

suggest changing the message to 'file recovery attempted but failed' for clarity.

libos/src/fs/libos_fs_encrypted.c line 328 at r1 (raw file):

    if (enc->recovery_file_pal_handle)
        (void)PalStreamDelete(enc->recovery_file_pal_handle, PAL_DELETE_ALL);

wondering if PalStreamDelete(enc->recovery_file_pal_handle, ..) should be invoked as well when pf_close() fails above?

common/src/protected_files/protected_files.c line 452 at r1 (raw file):

    assert(pf->host_recovery_file_handle);

    uint64_t offset = 0;

nitpicking: maybe moving the 'offset' declaration a few lines back, together with 'node'

common/src/protected_files/protected_files.c line 522 at r1 (raw file):

                pf->file_status = PF_STATUS_FLUSH_ERROR;
                DEBUG_PF("failed to write changes to the recovery file");
                return false;

maybe use "goto recoverable_error;" instead for consistency?

common/src/protected_files/protected_files.c line 528 at r1 (raw file):

                pf->file_status = PF_STATUS_FLUSH_ERROR;
                DEBUG_PF("failed to set the update flag");
                return false;

same as above

mkow

Reviewed 8 of 27 files at r1, all commit messages.
Reviewable status: 26 of 27 files reviewed, 20 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @ynonflumintel)

a discussion (no related file):
We probably should benchmark this before merging (see #2013 (comment)).

a discussion (no related file):
Do I understand correctly (at least looking at #2013 (comment)) that the recovery file stacks all changes that happened while the file was open? This may be a lot for long-running enclaves, even if the file on the disk is itself small, but often modified?

common/src/protected_files/protected_files.h line 222 at r1 (raw file):

 * \param      create                Overwrite file contents if true.
 * \param      key                   Wrap key.
 * \param      recovery_file_handle  (optional)Underlying recovery file handle.

Suggestion:

(optional) Underlying recovery file handle.

common/src/protected_files/protected_files.h line 313 at r1 (raw file):

 * \param      pf               PF context.
 * \param[out] recovery_needed  (optional)Whether recovery is needed for \p pf.
 * \param[out] pos_size         (optional)Size of the \p pf node position.

I don't understand this term, "size of position" - is this the size in bytes of a number indicating the position in the file? But that doesn't make much sense.

Code quote:

Size of the \p pf node position.

common/src/protected_files/protected_files.h line 314 at r1 (raw file):

 * \param[out] recovery_needed  (optional)Whether recovery is needed for \p pf.
 * \param[out] pos_size         (optional)Size of the \p pf node position.
 * \param[out] node_size        (optional)Size of the \p pf node data.

ditto (space)

common/src/protected_files/protected_files.h line 319 at r1 (raw file):

 */
pf_status_t pf_get_recovery_info(pf_context_t* pf, bool* recovery_needed, size_t* pos_size,
                                 size_t* node_size);

please add out_ prefix to the out arguments

common/src/protected_files/protected_files_format.h line 59 at r1 (raw file):

    pf_nonce_t metadata_key_nonce;
    pf_mac_t   metadata_mac; /* GCM mac */
    uint8_t    update_flag; /* for file recovery */

https://gramine.readthedocs.io/en/latest/devel/encfiles.html will need an update

common/src/protected_files/protected_files_format.h line 59 at r1 (raw file):

    pf_nonce_t metadata_key_nonce;
    pf_mac_t   metadata_mac; /* GCM mac */
    uint8_t    update_flag; /* for file recovery */

or something similar, update_flag sounds very unclear what it means

Suggestion:

has_pending_write

pal/include/pal/pal.h line 1050 at r1 (raw file):

 *
 * \param handle     Handle to the file.
 * \param handle     Handle to the recovery file.

Parameters don't match the signature.

pal/include/pal/pal.h line 1051 at r1 (raw file):

 * \param handle     Handle to the file.
 * \param handle     Handle to the recovery file.
 * \param pos_size   Size of the pf node position.

ditto, what's a "size of a position"?

Code quote:

Size of the pf node position.

pal/include/pal/pal.h line 1056 at r1 (raw file):

 * \returns 0 on success, negative error code on failure.
 */
int PalEncryptedFileRecovery(PAL_HANDLE file_handle, PAL_HANDLE recovery_file_handle,

This sounds like a bad function name, there's no verb in it and I was confused what it actually does before reading the more detailed documentation.

Code quote:

int PalEncryptedFileRecovery(

Documentation/manifest-syntax.rst line 1158 at r1 (raw file):

The ``enable_recovery`` mount parameter determines whether file recovery is
enabled for the mount. If omitted, it defaults to ``false``. This feature allows

Please be concise - instead of this just add "(default: false)" after the first mention of this option.

Code quote:

If omitted, it defaults to ``false``.

kailun-qin

Reviewable status: 7 of 34 files reviewed, 20 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @efu39, @mkow, and @ynonflumintel)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

Do I understand correctly (at least looking at #2013 (comment)) that the recovery file stacks all changes that happened while the file was open? This may be a lot for long-running enclaves, even if the file on the disk is itself small, but often modified?

No, the recovery file is limited to the current write transaction. During each file flush, only the cached blocks about to change are saved to the recovery file. The recovery file is then truncated and rewritten with the latest pending changes.

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

We probably should benchmark this before merging (see #2013 (comment)).

Sure, I'll run some micro and macro benchmarks later and share the results here.

common/src/protected_files/protected_files.h line 313 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I don't understand this term, "size of position" - is this the size in bytes of a number indicating the position in the file? But that doesn't make much sense.

Yes, it is exactly the size in bytes of a number indicating the position in the file. I made it initially for flexibility, but I have now removed it for clarity. Also added some comments in recover_encrypted_file().

common/src/protected_files/protected_files.h line 314 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (space)

Done.

common/src/protected_files/protected_files.h line 319 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

please add out_ prefix to the out arguments

Done.

common/src/protected_files/protected_files.c line 452 at r1 (raw file):

Previously, efu39 (Erica Fu) wrote…

nitpicking: maybe moving the 'offset' declaration a few lines back, together with 'node'

Done.

common/src/protected_files/protected_files.c line 522 at r1 (raw file):

Previously, efu39 (Erica Fu) wrote…

maybe use "goto recoverable_error;" instead for consistency?

Done.

common/src/protected_files/protected_files.c line 528 at r1 (raw file):

Previously, efu39 (Erica Fu) wrote…

same as above

Done.

common/src/protected_files/protected_files_format.h line 59 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

https://gramine.readthedocs.io/en/latest/devel/encfiles.html will need an update

Done.

common/src/protected_files/protected_files_format.h line 59 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

or something similar, update_flag sounds very unclear what it means

Done.

Documentation/manifest-syntax.rst line 1158 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Please be concise - instead of this just add "(default: false)" after the first mention of this option.

Done.

libos/src/fs/libos_fs_encrypted.c line 277 at r1 (raw file):

Previously, efu39 (Erica Fu) wrote…

suggest changing the message to 'file recovery attempted but failed' for clarity.

Done.

libos/src/fs/libos_fs_encrypted.c line 328 at r1 (raw file):

Previously, efu39 (Erica Fu) wrote…

wondering if PalStreamDelete(enc->recovery_file_pal_handle, ..) should be invoked as well when pf_close() fails above?

I intentionally made it this way; added a comment there.

pal/include/pal/pal.h line 1050 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Parameters don't match the signature.

Done.

pal/include/pal/pal.h line 1051 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto, what's a "size of a position"?

Not relevant any more.

pal/include/pal/pal.h line 1056 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This sounds like a bad function name, there's no verb in it and I was confused what it actually does before reading the more detailed documentation.

Done.

common/src/protected_files/protected_files.h line 222 at r1 (raw file):

 * \param      create                Overwrite file contents if true.
 * \param      key                   Wrap key.
 * \param      recovery_file_handle  (optional)Underlying recovery file handle.

Done.

kailun-qin · 2025-02-11T12:07:17Z

libos/include/libos_fs.h

+
+    /* Whether to enable file recovery (used by `chroot_encrypted` filesystem), false if not
+     * applicable */
+    bool enable_recovery;


Thanks for the input! I go w/ the first approach -- not allowing non-recoverable mounts if a recovery is needed.

kailun-qin · 2025-02-11T12:07:17Z

libos/include/libos_fs_encrypted.h

 *
 * `uri` must not correspond to an existing file.
 *
 * The newly created `libos_encrypted_file` object will have `use_count` set to 1.
 */
 int encrypted_file_create(const char* uri, mode_t perm, struct libos_encrypted_files_key* key,
-                          struct libos_encrypted_file** out_enc);
+                          bool enable_recovery, struct libos_encrypted_file** out_enc);


I left it as is for now.

mkow

Reviewed 14 of 27 files at r2, all commit messages.
Reviewable status: 21 of 34 files reviewed, 13 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @efu39, @kailun-qin, and @ynonflumintel)

a discussion (no related file):

Previously, kailun-qin (Kailun Qin) wrote…

No, the recovery file is limited to the current write transaction. During each file flush, only the cached blocks about to change are saved to the recovery file. The recovery file is then truncated and rewritten with the latest pending changes.

Ah. Could you update the comment I liked to, then? I think it's missing that step.

pal/include/pal/pal.h line 1050 at r1 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Done.

Not done?

a discussion (no related file):
Could you also update the PR description with update_flag -> has_pending_write?

-- commits line 23 at r2:

Suggestion:

`has_pending_write` flag

mkow

Reviewed 1 of 27 files at r2.
Reviewable status: 22 of 34 files reviewed, 15 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @efu39, @kailun-qin, and @ynonflumintel)

pal/src/host/linux-common/file_utils.c line 221 at r2 (raw file):

        }

        ret = write_all(file_fd, recovery_node + sizeof(uint64_t), node_size);

Why not using struct recovery_node_t here? You kind of hardcode its layout here anyways and changing the layout of struct recovery_node_t will silently break this code.

common/src/protected_files/protected_files.c line 1416 at r2 (raw file):

    if (out_node_size)
        *out_node_size = sizeof(((recovery_node_t*)0)->bytes);

Why is this parameter dynamic? Don't we hardcode the node size anyways, in the PF implementation?

kailun-qin

Reviewable status: 16 of 34 files reviewed, 15 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @efu39, @kailun-qin, @mkow, and @ynonflumintel)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

Ah. Could you update the comment I liked to, then? I think it's missing that step.

Done.

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

Could you also update the PR description with update_flag -> has_pending_write?

Done.

common/src/protected_files/protected_files.c line 1416 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why is this parameter dynamic? Don't we hardcode the node size anyways, in the PF implementation?

Not relevant any more.

pal/include/pal/pal.h line 1050 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Not done?

Not relevant any more.

pal/src/host/linux-common/file_utils.c line 221 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why not using struct recovery_node_t here? You kind of hardcode its layout here anyways and changing the layout of struct recovery_node_t will silently break this code.

Yeah, but struct recovery_node_t is defined in the generic pf lib, which is currently decoupled from PAL. I rethought a bit the design and implemented the logic entirely within the common pf. See if you like this better (or not).

-- commits line 23 at r2:
Done.

mkow

Reviewed 5 of 27 files at r1, 18 of 18 files at r3, all commit messages.
Reviewable status: all files reviewed, 23 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @efu39, @kailun-qin, and @ynonflumintel)

pal/src/host/linux-common/file_utils.c line 221 at r2 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Yeah, but struct recovery_node_t is defined in the generic pf lib, which is currently decoupled from PAL. I rethought a bit the design and implemented the logic entirely within the common pf. See if you like this better (or not).

Looks better now, IMO.

libos/test/fs/manifest.template line 14 at r3 (raw file):


  { type = "encrypted", path = "/tmp/enc_input", uri = "file:tmp/enc_input" },
  { type = "encrypted", path = "/tmp/enc_output", uri = "file:tmp/enc_output", enable_recovery = true },

Does this test actually tests this feature? If I didn't miss anything, this file is used only in a test which only writes to this file once, so not much will be tested?
Could we have a specialized test which also tests the recovery system?

libos/src/fs/libos_fs_encrypted.c line 182 at r3 (raw file):

        if (enc->enable_recovery) {
            const char* recovery_file_suffix = "_gramine_recovery";

I'd rather append such a suffix, otherwise it will look like concatenated to the original extension.

Suggestion:

".gramine_recovery"

libos/src/fs/libos_fs_encrypted.c line 193 at r3 (raw file):

            memcpy(recovery_file_uri, enc->uri, uri_len);
            memcpy(recovery_file_uri + uri_len, recovery_file_suffix, suffix_len);
            recovery_file_uri[uri_len + suffix_len] = '\0';

These ten lines can be all replaced by just:

char* recovery_file_uri = alloc_concat(enc->uri, -1, ".gramine_recovery", -1);
if (!recovery_file_uri) {
    ret = -ENOMEM;
    goto out;
}

libos/src/fs/libos_fs_encrypted.c line 311 at r3 (raw file):

        PalObjectDestroy(enc->recovery_file_pal_handle);
    enc->recovery_file_pal_handle = NULL;
    return;

unrelated to the PR, but we can remove this useless return while we're here

common/src/protected_files/protected_files.h line 223 at r3 (raw file):

 * \param      key                   Wrap key.
 * \param      recovery_file_handle  (optional) Underlying recovery file handle.
 * \param      recovery_file_size    Recovery file size.

Is this also optional? If so, what value should it have to mark it as "not set"? Also, why is this an argument? We already pass a handle to the file, why it is the caller who's responsible for retrieving the file size, not this function? Doesn't seem very natural.

update: is this because there's no cb_getsize in pf callbacks?

common/src/protected_files/protected_files.h line 232 at r3 (raw file):

                    pf_file_mode_t mode, bool create, const pf_key_t* key,
                    pf_handle_t recovery_file_handle, uint64_t recovery_file_size,
                    bool try_cover, pf_context_t** context);

It's a bit worrying that this compiled without any warnings. Any idea why wasn't this detected? I thought there was a warning for that? (mismatch between a declaration and the definition)

Suggestion:

bool try_recover

common/src/protected_files/protected_files.c line 449 at r3 (raw file):

}

static bool ipf_write_recovery_file(pf_context_t* pf) {

This way it's IMO more direct about what it does. When I started reading it I was wondering how does it know which nodes to dump if it only gets pf as an argument.

Suggestion:

ipf_dump_dirty_cache_to_recovery_file

common/src/protected_files/protected_files.c line 496 at r3 (raw file):

static bool ipf_clear_pending_write(pf_context_t* pf) {
    assert(pf->metadata_node.plaintext_part.has_pending_write == 0);

It's a bit weird now, that this function may assert-fail if it's called right after ipf_check_recovery_needed() (because it loads the disk node to memory). Maybe ipf_check_recovery_needed() should read the flag from disk, but clear again from the in-memory copy?

common/src/protected_files/protected_files.c line 980 at r3 (raw file):

    }

    size_t recovery_nodes_count = recovery_file_size / sizeof(recovery_node_t);

This is disk size, not memory size.

Suggestion:

uint64_t

common/src/protected_files/protected_files.c line 980 at r3 (raw file):

    }

    size_t recovery_nodes_count = recovery_file_size / sizeof(recovery_node_t);

Hmm, if we just read until EOF then we wouldn't need to pass the size through the whole hierarchy?

common/src/protected_files/protected_files.c line 982 at r3 (raw file):

    size_t recovery_nodes_count = recovery_file_size / sizeof(recovery_node_t);

    for (size_t i = 0; i < recovery_nodes_count; i++) {

Suggestion:

uint64_t

common/src/protected_files/protected_files.c line 992 at r3 (raw file):

        }

        size_t offset = recovery_node.physical_node_number;

Just to keep that in mind. I don't think it can lead to any problems, but I prefer to keep untrusted values clearly marked.

Suggestion:

uint64_t untrusted_offset

common/src/protected_files/protected_files.c line 1057 at r3 (raw file):

    if (!create) {
        if (!ipf_init_existing_file(pf, path))
            goto out;

missing last_error setting?

common/src/protected_files/protected_files.c line 1063 at r3 (raw file):

            if (!ipf_recover(pf, recovery_file_size))
                goto out;

Missing last_error setting? Also, why no message here, but a message in the if below?

kailun-qin

Reviewable status: 31 of 34 files reviewed, 23 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @efu39, @kailun-qin, @mkow, and @ynonflumintel)

common/src/protected_files/protected_files.h line 223 at r3 (raw file):

Is this also optional? If so, what value should it have to mark it as "not set"?

Yes, added a comment.

is this because there's no cb_getsize in pf callbacks?

Yes, we did the same for handle and underlying_size.

common/src/protected_files/protected_files.h line 232 at r3 (raw file):
Good catch, thanks.

Any idea why wasn't this detected?

Well, in the C standards, the identifiers of parameters in a function declaration are optional while the types are what really matter.

I thought there was a warning for that? (mismatch between a declaration and the definition)

Yeah, but it's not built into the compiler. Instead, it's available in some static analysis tools, e.g., clang-tidy https://clang.llvm.org/extra/clang-tidy/checks/readability/inconsistent-declaration-parameter-name.html.

common/src/protected_files/protected_files.c line 449 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This way it's IMO more direct about what it does. When I started reading it I was wondering how does it know which nodes to dump if it only gets pf as an argument.

Done.

common/src/protected_files/protected_files.c line 496 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

It's a bit weird now, that this function may assert-fail if it's called right after ipf_check_recovery_needed() (because it loads the disk node to memory). Maybe ipf_check_recovery_needed() should read the flag from disk, but clear again from the in-memory copy?

The check is only used when initializing an existing file. I've now inlined the check, moved it earlier, and ensured the flag is cleared in the in-mem copy.

common/src/protected_files/protected_files.c line 980 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Hmm, if we just read until EOF then we wouldn't need to pass the size through the whole hierarchy?

Well, yes. But reading and applying recovery nodes one by one risks main file corruption if the recovery file size is incorrect. Alternatively, we can dynamically allocate memory for all recovery nodes, read until EOF and validate them as a whole, and then apply them, but this sounds a bit overkill.

common/src/protected_files/protected_files.c line 980 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This is disk size, not memory size.

Done.

common/src/protected_files/protected_files.c line 992 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Just to keep that in mind. I don't think it can lead to any problems, but I prefer to keep untrusted values clearly marked.

Done.

common/src/protected_files/protected_files.c line 1057 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

missing last_error setting?

The last error is already set in ipf_init_existing_file().

common/src/protected_files/protected_files.c line 1063 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Missing last_error setting? Also, why no message here, but a message in the if below?

Not relevant any more.

libos/src/fs/libos_fs_encrypted.c line 182 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'd rather append such a suffix, otherwise it will look like concatenated to the original extension.

Done.

libos/src/fs/libos_fs_encrypted.c line 193 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

These ten lines can be all replaced by just:

char* recovery_file_uri = alloc_concat(enc->uri, -1, ".gramine_recovery", -1);
if (!recovery_file_uri) {
    ret = -ENOMEM;
    goto out;
}

Done.

libos/src/fs/libos_fs_encrypted.c line 311 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

unrelated to the PR, but we can remove this useless return while we're here

Done.

libos/test/fs/manifest.template line 14 at r3 (raw file):

Does this test actually tests this feature? If I didn't miss anything, this file is used only in a test which only writes to this file once, so not much will be tested?

Only the extended flush logic, not the recovery flow.

Could we have a specialized test which also tests the recovery system?

Yeah, but it's challenging to test this automatically. Controlling when the test should crash abruptly to generate a valid dump with the pending write flag set is difficult. One way I can think of for testing the recovery flow (not the entire flush-then-recover process) is to use some pre-generated corrupted main files with dumps.

common/src/protected_files/protected_files.c line 982 at r3 (raw file):

    size_t recovery_nodes_count = recovery_file_size / sizeof(recovery_node_t);

    for (size_t i = 0; i < recovery_nodes_count; i++) {

Done.

mkow

Reviewed 2 of 3 files at r4, all commit messages.
Reviewable status: 33 of 34 files reviewed, 13 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @efu39, @kailun-qin, and @ynonflumintel)

common/src/protected_files/protected_files.h line 232 at r3 (raw file):

Well, in the C standards, the identifiers of parameters in a function declaration are optional while the types are what really matter.

Yeah, I now that, but I thought the compilers were already warning about that, not only linters :(

a discussion (no related file):
btw. I think you can mark this PR as ready (it's still a draft)

mkow

Reviewed 1 of 3 files at r4.
Reviewable status: all files reviewed, 10 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @efu39, @kailun-qin, and @ynonflumintel)

libos/test/fs/manifest.template line 14 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Does this test actually tests this feature? If I didn't miss anything, this file is used only in a test which only writes to this file once, so not much will be tested?

Only the extended flush logic, not the recovery flow.

Could we have a specialized test which also tests the recovery system?

Yeah, but it's challenging to test this automatically. Controlling when the test should crash abruptly to generate a valid dump with the pending write flag set is difficult. One way I can think of for testing the recovery flow (not the entire flush-then-recover process) is to use some pre-generated corrupted main files with dumps.

Hmm, what are the exact scenarios when we can end up with a corrupted PF? Is it only if we crash in the middle of ipf_internal_flush()? If so, I guess it's not possible to crash at any point in the app manually and cause this corruption?

kailun-qin

Reviewable status: all files reviewed, 10 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @efu39, @mkow, and @ynonflumintel)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

btw. I think you can mark this PR as ready (it's still a draft)

Done, thanks for reminding!

libos/test/fs/manifest.template line 14 at r3 (raw file):

Is it only if we crash in the middle of ipf_internal_flush()?

Yes.

If so, I guess it's not possible to crash at any point in the app manually and cause this corruption?

Yes, I think so.

mkow

Reviewable status: all files reviewed, 10 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @efu39, @kailun-qin, and @ynonflumintel)

libos/test/fs/manifest.template line 14 at r3 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Is it only if we crash in the middle of ipf_internal_flush()?

Yes.

If so, I guess it's not possible to crash at any point in the app manually and cause this corruption?

Yes, I think so.

Ok, then it's quite problematic to test... Could you add least add a comment / warning somewhere that this feature is not tested in CI, because it's hard to test this scenario?

a discussion (no related file):
@efu39, @ynonflumintel: Please re-review the PR, you are still blocking it in a few discussions.

efu39

Reviewed 9 of 27 files at r2, 15 of 18 files at r3, 3 of 3 files at r4, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @ynonflumintel)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

@efu39, @ynonflumintel: Please re-review the PR, you are still blocking it in a few discussions.

Done. Sorry for late response.

kailun-qin

Reviewable status: 30 of 34 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @efu39, @mkow, and @ynonflumintel)

libos/test/fs/manifest.template line 14 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Ok, then it's quite problematic to test... Could you add least add a comment / warning somewhere that this feature is not tested in CI, because it's hard to test this scenario?

Done.

a discussion (no related file):
We've received new requests from internal customers for Gramine to be backward-compatible (when the feature is disabled/enabled) with files created by older Gramine versions.

I don't recall us discussing version management of Gramine's encrypted filesystem for compatibility, and I think SGX SDK PFS doesn't support this either. We should consider discussing this in an upcoming Gramine core meeting.

mkow

Reviewed 4 of 4 files at r5, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @ynonflumintel)

a discussion (no related file):

Previously, efu39 (Erica Fu) wrote…

Done. Sorry for late response.

Thanks!
@ynonflumintel: ping ^

a discussion (no related file):

to be backward-compatible (when the feature is disabled/enabled) with files created by older Gramine versions.

What do you mean by that, exactly?

libos/src/fs/chroot/encrypted.c line 122 at r5 (raw file):

        goto out;

    if (strendswith(uri, RECOVERY_FILE_URI_SUFFIX)) {

What happens if the user app tries to open or create (and possibly overwrite) a file with the RECOVERY_FILE_URI_SUFFIX suffix? Should we also hide those files from it?

kailun-qin

Reviewable status: 33 of 34 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow and @ynonflumintel)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

to be backward-compatible (when the feature is disabled/enabled) with files created by older Gramine versions.

What do you mean by that, exactly?

For example, users might have encrypted files generated by an older version of Gramine (e.g., v1.8, which doesn't support file recovery). When they upgrade to the new version of Gramine with file recovery support, they would expect their existing encrypted files to still be accessible (w/ recovery enabled/disabled on these files). However, since we've added the has_pending_write flag in metadata_plaintext_t, which changes the layout of the metadata node, this scenario is not currently supported.

libos/src/fs/chroot/encrypted.c line 122 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

What happens if the user app tries to open or create (and possibly overwrite) a file with the RECOVERY_FILE_URI_SUFFIX suffix? Should we also hide those files from it?

Done.

kailun-qin · 2025-03-10T06:42:07Z

Jenkins, retest Jenkins-SGX-22.04-Sanitizers please (no available TCS pages left for a new thread, known and unrelated to the PR)

mkow

Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @ynonflumintel)

a discussion (no related file):

Previously, kailun-qin (Kailun Qin) wrote…

For example, users might have encrypted files generated by an older version of Gramine (e.g., v1.8, which doesn't support file recovery). When they upgrade to the new version of Gramine with file recovery support, they would expect their existing encrypted files to still be accessible (w/ recovery enabled/disabled on these files). However, since we've added the has_pending_write flag in metadata_plaintext_t, which changes the layout of the metadata node, this scenario is not currently supported.

Ah, I see. But we've never supported backward compatibility for PF and didn't include it in the design... How would you want to add that here?

libos/src/fs/chroot/encrypted.c line 195 at r6 (raw file):

}

static int chroot_encrypted_open(struct libos_handle* hdl, struct libos_dentry* dent, int flags) {

Open is not using lookup(), so I guess we also need to filter those files here?

kailun-qin

Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow and @ynonflumintel)

a discussion (no related file):

But we've never supported backward compatibility for PF and didn't include it in the design...

Yes, that's correct. That's also why I'm not sure if we want to support this (in this PR or separately).

How would you want to add that here?

I'm not sure yet. I'm thinking about using the versioning that already exists in metadata_plaintext_t:

gramine/common/src/protected_files/protected_files_format.h

Lines 55 to 56 in ef48c72

    
           uint8_t    major_version; 
        
           uint8_t    minor_version;

We could version the data structures differently for field changes and use them separately based on the version metadata of an existing file upon opening. However, this seems rather ad-hoc. I think we need a more general convention for defining/updating these major/minor versions.

I'm also unclear about the expected behaviors (e.g., should we automatically update the file version when accessed by a newer version of Gramine; should we allow access to newer-versioned files using an older-versioned Gramine). I'm reaching out to our internal users to clarify their requirements and to join our core meeting discussions, so we can figure out the next steps.

libos/src/fs/chroot/encrypted.c line 195 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Open is not using lookup(), so I guess we also need to filter those files here?

Pls correct me if I'm wrong, but open is using lookup():

gramine/libos/src/fs/libos_namei.c

Line 438 in ef48c72

ret = path_lookupat(start, path, lookup_flags, &dent);

and this negative dentry would eventually end up here:

gramine/libos/src/fs/libos_namei.c

Lines 478 to 482 in ef48c72

    
           if (!dent->inode) { 
        
               if (!(flags & O_CREAT)) { 
        
                   ret = -ENOENT; 
        
                   goto out; 
        
               }

?

mkow

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @ynonflumintel)

libos/src/fs/chroot/encrypted.c line 195 at r6 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Pls correct me if I'm wrong, but open is using lookup():

gramine/libos/src/fs/libos_namei.c

Line 438 in ef48c72

ret = path_lookupat(start, path, lookup_flags, &dent);

and this negative dentry would eventually end up here:

gramine/libos/src/fs/libos_namei.c

Lines 478 to 482 in ef48c72

if (!dent->inode) {

if (!(flags & O_CREAT)) {

ret = -ENOENT;

goto out;

}

?

Ah, right. Resolving.

cloudnoize · 2025-03-11T16:18:06Z

common/src/protected_files/protected_files_format.h

@@ -56,6 +56,7 @@ typedef struct {
    uint8_t    minor_version;
    pf_nonce_t metadata_key_nonce;
    pf_mac_t   metadata_mac; /* GCM mac */
+    uint8_t    has_pending_write; /* flag for file recovery */


Wouldn't it be better to introduce a generic flags field where each bit represents a flag?
For future extensibility with upgrade support

Pls check the update (w/ also the compatibility support).

FYI: I dismissed @cloudnoize block here, as they are not a maintainer and are not responding.

kailun-qin · 2025-03-18T04:29:07Z

Jenkins, retest Jenkins-Direct-22.04 please (fdatasync01 LTP test timed out, known and unrelated)

kailun-qin · 2025-03-18T04:29:31Z

Jenkins, retest Jenkins-Direct-22.04-Debug please (fdatasync01 LTP test timed out, known and unrelated)

kailun-qin · 2025-03-18T04:29:56Z

Jenkins, retest Jenkins-Direct-24.04-Debug please (fdatasync01 LTP test timed out, known and unrelated)

mkow

Reviewed 7 of 7 files at r9, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @cloudnoize, @kailun-qin, and @ynonflumintel)

kailun-qin

Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @cloudnoize, @jinengandhi-intel, @mkow, and @ynonflumintel)

a discussion (no related file):

Previously, kailun-qin (Kailun Qin) wrote…

Sure, I'll run some micro and macro benchmarks later and share the results here.

@jinengandhi-intel: Jinen, could you pls post the performance tests results here so that we can move this PR forward? Thanks!

a discussion (no related file):

Previously, kailun-qin (Kailun Qin) wrote…

I think they're testing it (I'll double check). I tested it manually on my end.

Here is the feedback from our internal customer who requested this feature: "We have completed basic upgrade and run with the new Gramine version. It sems to be working as expected.".

I'm unblocking this comment.

vasanth-intel · 2025-04-14T06:49:56Z

@kailun-qin Please find the performance results of the workloads on the PR attached.

PR_2082_Feb_25_Performance_Results.xlsx

kailun-qin

@vasanth-intel: Thanks!

Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @cloudnoize, @jinengandhi-intel, @mkow, and @ynonflumintel)

a discussion (no related file):

Previously, kailun-qin (Kailun Qin) wrote…

@jinengandhi-intel: Jinen, could you pls post the performance tests results here so that we can move this PR forward? Thanks!

Pls see the perf tests results above. I'm unblocking this comment.

mkow

@vasanth-intel: Please post the test results here (ideally as a table in Markdown).

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @cloudnoize, @jinengandhi-intel, and @ynonflumintel)

vasanth-intel · 2025-04-23T06:42:18Z

@vasanth-intel: Please post the test results here (ideally as a table in Markdown).

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @cloudnoize, @jinengandhi-intel, and @ynonflumintel)

@mkow: The spreadsheet attached above (PR_2082_Feb_25_Performance_Results.xlsx) has the performance results for 6 workloads with results being compared with the baseline data with the feature being set to ON, OFF and default (w/o 'enable_recovery'). This data comprises of about 20 tables which I think is not feasible or readable to be put here as Markdown. Hence, did not add it as a table with Markdown.

donporter · 2025-05-06T14:33:03Z

I took the liberty of taking data out of the xlsx.

Here is OpenVINO data:

Baseline - Gramine v1.8 Ubuntu 22.04 Kernel 6.2	NATIVE-MED	SGX-MED	DIRECT-MED	SGX-DEG	DIRECT-DEG
test_ov_perf_bert_large_fp16_throughput	24.12	19.51	23.355	19.113	3.172
test_ov_perf_bert_large_fp32_throughput	24.055	21.05	24.26	12.492	-0.852
test_ov_perf_bert_large_int8_throughput	69.975	64.485	69.685	7.846	0.414
test_ov_perf_brain_tumor_seg_0001_fp16_throughput	7.395	5.43	7.375	26.572	0.27
test_ov_perf_brain_tumor_seg_0001_fp32_throughput	7.33	5.635	7.34	23.124	-0.136
test_ov_perf_brain_tumor_seg_0002_fp16_throughput	10.145	5.575	10.155	45.047	-0.099
test_ov_perf_brain_tumor_seg_0002_fp32_throughput	10.105	5.59	10.025	44.681	0.792
test_ov_perf_resnet_fp16_throughput	861.785	730.895	859.335	15.188	0.284
test_ov_perf_resnet_fp32_throughput	841.8	718.35	841.49	14.665	0.037
test_ov_perf_ssd_mobilenet_fp16_throughput	2090.27	1373.375	2054.01	34.297	1.735
test_ov_perf_ssd_mobilenet_fp32_throughput	1992.03	1357.925	2035.815	31.832	-2.198

Run on PR 2802 : enable_recovery not set explicitly
Jenkins build # 1729
test_ov_perf_bert_large_fp16_throughput
test_ov_perf_bert_large_fp32_throughput
test_ov_perf_bert_large_int8_throughput
test_ov_perf_brain_tumor_seg_0001_fp16_throughput
test_ov_perf_brain_tumor_seg_0001_fp32_throughput
test_ov_perf_brain_tumor_seg_0002_fp16_throughput
test_ov_perf_brain_tumor_seg_0002_fp32_throughput
test_ov_perf_resnet_fp16_throughput
test_ov_perf_resnet_fp32_throughput
test_ov_perf_ssd_mobilenet_fp16_throughput
test_ov_perf_ssd_mobilenet_fp32_throughput

vs 1.8
0.076
-1.286
-0.425
-7.554
-1.305
0.014
-0.198
-0.463
0.275
-2.024
0.924

Baseline - Gramine v1.8 Ubuntu 22.04 Kernel 6.2	NATIVE-MED	SGX-MED	DIRECT-MED	SGX-DEG	DIRECT-DEG
test_ov_perf_bert_large_fp16_latency	86.815	89.22	87.375	2.77	0.645
test_ov_perf_bert_large_fp32_latency	90.8	93.03	92.63	2.456	2.015
test_ov_perf_bert_large_int8_latency	39.81	40.595	39.885	1.972	0.188
test_ov_perf_brain_tumor_seg_0001_fp16_latency	277.56	306.385	278.985	10.385	0.513
test_ov_perf_brain_tumor_seg_0001_fp32_latency	278.78	309.11	280.77	10.88	0.714
test_ov_perf_brain_tumor_seg_0002_fp16_latency	205.37	212.355	206.65	3.401	0.623
test_ov_perf_brain_tumor_seg_0002_fp32_latency	205.465	213.815	206.715	4.064	0.608
test_ov_perf_resnet_fp16_latency	4.02	4.395	4.055	9.328	0.871
test_ov_perf_resnet_fp32_latency	4.095	4.475	4.165	9.28	1.709
test_ov_perf_ssd_mobilenet_fp16_latency	1.38	1.48	1.35	7.246	-2.174
test_ov_perf_ssd_mobilenet_fp32_latency	1.39	1.495	1.4	7.554	0.719

Run on PR 2802 : enable_recovery not set explicitly
Jenkins build # 1726
test_ov_perf_bert_large_fp16_latency
test_ov_perf_bert_large_fp32_latency
test_ov_perf_bert_large_int8_latency
test_ov_perf_brain_tumor_seg_0001_fp16_latency
test_ov_perf_brain_tumor_seg_0001_fp32_latency
test_ov_perf_brain_tumor_seg_0002_fp16_latency
test_ov_perf_brain_tumor_seg_0002_fp32_latency
test_ov_perf_resnet_fp16_latency
test_ov_perf_resnet_fp32_latency
test_ov_perf_ssd_mobilenet_fp16_latency
test_ov_perf_ssd_mobilenet_fp32_latency

vs 1.8
0.147
-0.256
-0.013
0.067
1.283
0.084
-0.077
-0.686
-1.634
2.013
-0.36

donporter · 2025-05-06T14:37:35Z

SpecPower data:

Baseline - Gramine v1.8 Ubuntu 22.04 Kernel 6.2	NATIVE-MED	SGX-MED	DIRECT-MED	SGX-DEG	DIRECT-DEG
test_specpower_perf_throughput	2123410.49	1750118.11	2059715.45	17.58	3

Run on PR 2802 : enable_recovery not set explicitly
Jenkins build # 1724
test_specpower_perf_throughput

vs v1.8
-0.079

donporter · 2025-05-06T14:39:41Z

MongoDB:

Baseline - Gramine v1.8 Ubuntu 22.04 Kernel 6.2	NATIVE-MED	SGX-MED	DIRECT-MED	SGX-DEG	DIRECT-DEG
test_mongodb_perf_set_operation_1_threads_throughput	10066.250, 9423.216, 10082.932, 8226.673, 9837.386	6628.939, 6250.940, 6604.045, 5806.552, 6545.866	9922.308, 9246.796, 9925.986, 8106.449, 9880.635	34.146, 33.663, 34.502, 29.404, 33.454	1.430, 1.872, 1.557, 1.455, -0.445
test_mongodb_perf_set_operation_2_threads_throughput	19745.063, 18134.304, 19616.490, 16010.475, 19297.100	12787.997, 12050.547, 12800.992, 11077.285, 12637.163	19032.672, 17927.149, 19350.408, 15832.256, 19037.492	35.232, 33.548, 34.741, 30.806, 34.51	3.607, 1.142, 1.341, 1.092, 1.33
test_mongodb_perf_set_operation_8_threads_throughput	69593.566, 64216.792, 70942.606, 56700.199, 69259.570	45240.123, 41527.879, 45349.346, 39019.070, 44287.129	68709.024, 63748.833, 69803.714, 55418.019, 67737.886	34.989, 35.328, 36.065, 31.178, 36.05	1.270, 0.706, 1.596, 2.247, 2.19
test_mongodb_perf_set_operation_16_threads_throughput	122930.589, 111201.344, 122201.593, 97882.8726, 121238.606	80145.579, 72765.649, 79543.321, 68216.458, 77669.900	118574.075, 105945.171, 118913.614, 95375.903, 116861.890	34.800, 34.555, 34.908, 30.305, 35.935	3.544, 4.724, 2.691, 2.549, 3.609
test_mongodb_perf_set_operation_32_threads_throughput	182860.191, 162478.208, 183383.094, 147808.165, 177914.473	125582.69, 113771.259, 125416.136, 106366.543, 124685.993	140307.930, 134929.739, 139338.126, 124859.532, 138667.886	31.317, 29.978, 31.606, 28.032, 29.909	23.283, 16.955, 24.005, 15.503, 22.092
test_mongodb_perf_set_operation_64_threads_throughput	193934.029, 190034.495, 192788.397, 186903.645, 192552.410	106103.82, 102098.152, 109108.501, 102225.439, 103853.042	109114.933, 106856.058, 110123.011, 106826.133, 108315.829	45.341, 46.296, 43.441, 45.316, 46.115	43.717, 43.774, 42.867, 42.827, 43.732
test_mongodb_perf_get_operation_1_threads_throughput	[57674.508, 58034.801, 57629.117, 58204.486, 57790.17]	[22093.049, 22187.593, 22234.742, 22363.264, 22180.299]	[51772.411, 51954.449, 51728.762, 51893.353, 51616.818]	[61.694, 61.768, 61.418, 61.578, 61.619]	[10.233, 10.477, 10.238, 10.843, 10.682]
test_mongodb_perf_get_operation_2_threads_throughput	[112548.738, 112365.874, 112927.306, 113329.468, 111377.946]	[43103.693, 43014.254, 43118.028, 43317.049, 43066.057]	[98750.393, 99188.773, 98943.769, 99446.397, 99652.151]	[61.702, 61.719, 61.818, 61.778, 61.333]	[12.26, 11.727, 12.383, 12.25, 10.528]
test_mongodb_perf_get_operation_8_threads_throughput	[394515.33, 388799.893, 388328.195, 361194.955, 362959.082]	[151596.323, 150081.576, 150553.595, 147750.924, 147874.353]	[364211.856, 298118.329, 325180.959, 319325.904, 364994.902]	[61.574, 61.399, 61.23, 59.094, 59.259]	[7.681, 23.323, 16.261, 11.592, -0.561]
test_mongodb_perf_get_operation_16_threads_throughput	[573131.68, 599509.42, 565375.228, 577013.796, 553154.126]	[251883.65, 255277.692, 256890.926, 258152.163, 246798.08]	[386313.321, 389928.305, 376528.233, 400643.215, 426919.584]	[56.051, 57.419, 54.563, 55.261, 55.383]	[32.596, 34.959, 33.402, 30.566, 22.821]
test_mongodb_perf_get_operation_32_threads_throughput	[490187.35, 506552.09, 519852.426, 539099.103, 545465.454]	[309850.992, 286140.843, 308554.299, 254476.726, 272924.69]	[251962.846, 228079.767, 266078.541, 230028.253, 222804.778]	[36.789, 43.512, 40.646, 52.796, 49.965]	[48.599, 54.974, 48.817, 57.331, 59.153]
test_mongodb_perf_get_operation_64_threads_throughput	[384796.427, 385091.449, 387413.567, 387246.817, 381337.097]	[192672.157, 190891.126, 193574.646, 190818.387, 192190.567]	[239273.309, 239935.596, 235406.522, 242424.127, 244440.596]	[49.929, 50.43, 50.034, 50.724, 49.601]	[37.818, 37.694, 39.236, 37.398, 35.899]

Run on PR 2802 : enable_recovery not set explicitly
Jenkins build # 1728
test_mongodb_perf_set_operation_1_threads_throughput
test_mongodb_perf_set_operation_2_threads_throughput
test_mongodb_perf_set_operation_8_threads_throughput
test_mongodb_perf_set_operation_16_threads_throughput
test_mongodb_perf_set_operation_32_threads_throughput
test_mongodb_perf_set_operation_64_threads_throughput
test_mongodb_perf_get_operation_1_threads_throughput
test_mongodb_perf_get_operation_2_threads_throughput
test_mongodb_perf_get_operation_8_threads_throughput
test_mongodb_perf_get_operation_16_threads_throughput
test_mongodb_perf_get_operation_32_threads_throughput
test_mongodb_perf_get_operation_64_threads_throughput

donporter · 2025-05-06T14:43:41Z

MySQL Fix: https://github.com/gramineproject/gramine/tree/kailun-qin/skip-recovery-file-lookup-test

Baseline v1.8 - 1L Entries	Baseline - Gramine v1.8 Ubuntu 22.04 Kernel 6.2
NATIVE-MED	SGX-MED
test_mysql_perf_read_only_1_threads_throughput	15151.41
test_mysql_perf_read_only_8_threads_throughput	87322.72
test_mysql_perf_read_only_16_threads_throughput	138210.1
test_mysql_perf_read_only_32_threads_throughput	342504.2
test_mysql_perf_read_only_64_threads_throughput	535151.3
test_mysql_perf_write_only_1_threads_throughput	24434.04
test_mysql_perf_write_only_8_threads_throughput	183235.6
test_mysql_perf_write_only_16_threads_throughput	345821.5
test_mysql_perf_write_only_32_threads_throughput	426137.8
test_mysql_perf_read_write_1_threads_throughput	15440.38
test_mysql_perf_read_write_8_threads_throughput	88702.63
test_mysql_perf_read_write_16_threads_throughput	126891.1
test_mysql_perf_read_write_32_threads_throughput	335632.3
test_mysql_perf_read_write_64_threads_throughput	534319.6

Jenkins Build: 1760				vs v1.8
1L Entries - Iteration 1	NATIVE-MED	SGX-MED	SGX-DEG
test_mysql_perf_read_only_1_threads_throughput	15456.760	9482.940	38.649	0.48
test_mysql_perf_read_only_8_threads_throughput	92107.490	67058.190	27.196	2.962
test_mysql_perf_read_only_16_threads_throughput	149262.980	98574.400	33.959	4.702
test_mysql_perf_read_only_32_threads_throughput	351821.860	253550.850	27.932	-0.694
test_mysql_perf_read_only_64_threads_throughput	536486.190	386748.915	27.911	0.465
test_mysql_perf_write_only_1_threads_throughput	24254.400	13863.810	42.840	2.348
test_mysql_perf_write_only_8_threads_throughput	181801.665	92959.120	48.868	1.968
test_mysql_perf_write_only_16_threads_throughput	342397.315	181356.200	47.033	-1.405
test_mysql_perf_write_only_32_threads_throughput	441030.635	52010.095	88.207	0.696
test_mysql_perf_read_write_1_threads_throughput	15554.425	9066.700	41.710	3.507
test_mysql_perf_read_write_8_threads_throughput	93957.735	68252.050	27.359	4.622
test_mysql_perf_read_write_16_threads_throughput	131710.160	105130.600	20.180	2.442
test_mysql_perf_read_write_32_threads_throughput	344343.865	228817.275	33.550	1.932
test_mysql_perf_read_write_64_threads_throughput	534841.015	177546.425	66.804	33.883

Jenkins Build: 1768				vs v1.8
1L Entries - Re-run read-write 64 threads for 66% Degradation seen above	NATIVE-MED	SGX-MED	SGX-DEG
test_mysql_perf_read_write_64_threads_throughput	538009.520	348355.630	35.251	2.33

Jenkins Build: 1770,1771		vs v1.8
1L Entries - Iteration 2	NATIVE-MED	SGX-MED
test_mysql_perf_read_only_1_threads_throughput	15455.315	8662.020
test_mysql_perf_read_only_8_threads_throughput	91507.845	63547.105
test_mysql_perf_read_only_16_threads_throughput	148689.145	103182.065
test_mysql_perf_read_only_32_threads_throughput	350934.380	237852.720
test_mysql_perf_read_only_64_threads_throughput	537388.775	370112.935
test_mysql_perf_write_only_1_threads_throughput	24523.705	12780.095
test_mysql_perf_write_only_8_threads_throughput	181600.240	85774.305
test_mysql_perf_write_only_16_threads_throughput	344870.980	162835.235
test_mysql_perf_write_only_32_threads_throughput	424808.455	47966.480
test_mysql_perf_read_write_1_threads_throughput	15612.075	9050.750
test_mysql_perf_read_write_8_threads_throughput	93779.870	64107.710
test_mysql_perf_read_write_16_threads_throughput	130981.790	108045.600
test_mysql_perf_read_write_32_threads_throughput	344486.375	228468.765
test_mysql_perf_read_write_64_threads_throughput	542245.025	348646.995

Baseline v1.8 - 50L Entries	Baseline - Gramine v1.8 Ubuntu 22.04 Kernel 6.2
NATIVE-MED	SGX-MED
test_mysql_perf_read_only_1_threads_throughput	13630.890
test_mysql_perf_read_only_8_threads_throughput	80847.230
test_mysql_perf_read_only_16_threads_throughput	120518.190
test_mysql_perf_read_only_32_threads_throughput	297634.295
test_mysql_perf_read_only_64_threads_throughput	383577.050
test_mysql_perf_write_only_1_threads_throughput	14585.950
test_mysql_perf_write_only_8_threads_throughput	67000.030
test_mysql_perf_write_only_16_threads_throughput	114013.800
test_mysql_perf_write_only_32_threads_throughput	209779.760
test_mysql_perf_read_write_1_threads_throughput	11778.100
test_mysql_perf_read_write_8_threads_throughput	64395.785
test_mysql_perf_read_write_16_threads_throughput	116445.890
test_mysql_perf_read_write_32_threads_throughput	217642.830
test_mysql_perf_read_write_64_threads_throughput	141647.585

Jenkins Build: 1761				vs v1.8
50L Entries - Iteration 1	NATIVE-MED	SGX-MED	SGX-DEG
test_mysql_perf_read_only_1_threads_throughput	13804.235	5876.430	57.430	-1.948
test_mysql_perf_read_only_8_threads_throughput	81848.705	24329.660	70.275	-1.059
test_mysql_perf_read_only_16_threads_throughput	120998.570	27386.335	77.366	-0.642
test_mysql_perf_read_only_32_threads_throughput	298116.915	29057.855	90.253	-0.663
test_mysql_perf_read_only_64_threads_throughput	378351.680	29721.110	92.145	-0.689
test_mysql_perf_write_only_1_threads_throughput	14676.205	2278.845	84.473	-0.771
test_mysql_perf_write_only_8_threads_throughput	68323.285	7751.640	88.654	-0.962
test_mysql_perf_write_only_16_threads_throughput	114035.880	10682.290	90.633	-1.056
test_mysql_perf_write_only_32_threads_throughput	212828.555	11832.595	94.440	-0.305
test_mysql_perf_read_write_1_threads_throughput	11936.935	3356.680	71.880	-1.727
test_mysql_perf_read_write_8_threads_throughput	65197.180	10648.325	83.668	-1.583
test_mysql_perf_read_write_16_threads_throughput	117967.725	13326.140	88.704	-1.015
test_mysql_perf_read_write_32_threads_throughput	217463.595	13594.870	93.748	-0.466
test_mysql_perf_read_write_64_threads_throughput	136661.625	14058.490	89.713	-0.727

Jenkins Build : 1769				vs v1.8
50L Entries - Iteration 2	NATIVE-MED	SGX-MED	SGX-DEG
test_mysql_perf_read_only_1_threads_throughput	13776.050	5484.355	60.189	0.811
test_mysql_perf_read_only_8_threads_throughput	82480.455	23060.505	72.041	0.707
test_mysql_perf_read_only_16_threads_throughput	122015.700	26211.155	78.518	0.51
test_mysql_perf_read_only_32_threads_throughput	296535.780	27026.560	90.886	-0.03
test_mysql_perf_read_only_64_threads_throughput	370723.215	27545.410	92.570	-0.264
test_mysql_perf_write_only_1_threads_throughput	14619.545	2120.540	85.495	0.251
test_mysql_perf_write_only_8_threads_throughput	69705.865	6963.970	90.009	0.393
test_mysql_perf_write_only_16_threads_throughput	114879.945	9440.520	91.782	0.093
test_mysql_perf_write_only_32_threads_throughput	210631.345	10696.380	94.922	0.177
test_mysql_perf_read_write_1_threads_throughput	11875.240	3270.235	72.462	-1.145
test_mysql_perf_read_write_8_threads_throughput	64824.990	9560.190	85.252	0.001
test_mysql_perf_read_write_16_threads_throughput	118145.495	12476.845	89.439	-0.28
test_mysql_perf_read_write_32_threads_throughput	218749.295	12359.000	94.350	0.136
test_mysql_perf_read_write_64_threads_throughput	155585.305	13122.245	91.566	1.126

donporter · 2025-05-06T14:46:42Z

MySQL:

Baseline - Gramine v1.8 Ubuntu 22.04 Kernel 6.2	NATIVE-MED	SGX-MED	SGX-DEG
test_mysql_perf_read_only_1_threads_throughput	15151.405	9368.19	38.169
test_mysql_perf_read_only_8_threads_throughput	87322.72	66160.56	24.234
test_mysql_perf_read_only_16_threads_throughput	138210.065	97773.83	29.257
test_mysql_perf_read_only_32_threads_throughput	342504.235	244459.3	28.626
test_mysql_perf_read_only_64_threads_throughput	535151.34	388273.4	27.446
test_mysql_perf_write_only_1_threads_throughput	24434.035	14540.29	40.492
test_mysql_perf_write_only_8_threads_throughput	183235.62	97297.68	46.9
test_mysql_perf_write_only_16_threads_throughput	345821.53	178312.4	48.438
test_mysql_perf_write_only_32_threads_throughput	426137.82	53220.43	87.511
test_mysql_perf_read_write_1_threads_throughput	15440.38	9541.655	38.203
test_mysql_perf_read_write_8_threads_throughput	88702.63	68534.48	22.737
test_mysql_perf_read_write_16_threads_throughput	126891.11	104383.3	17.738
test_mysql_perf_read_write_32_threads_throughput	335632.33	229513.3	31.618
test_mysql_perf_read_write_64_threads_throughput	534319.58	358416.9	32.921

Run on PR 2802 : enable_recovery set to True	vs v1.8
Jenkins build # 1738	NATIVE-AVG
test_mysql_perf_read_only_1_threads_throughput	15395.135
test_mysql_perf_read_only_8_threads_throughput	91478.270
test_mysql_perf_read_only_16_threads_throughput	150844.485
test_mysql_perf_read_only_32_threads_throughput	350968.815
test_mysql_perf_write_only_8_threads_throughput	181598.470
test_mysql_perf_read_write_1_threads_throughput	15679.195
test_mysql_perf_read_write_32_threads_throughput	342299.470

Run on PR 2802 : enable_recovery = false	vs v1.8
Jenkins build # 1740	NATIVE-AVG
test_mysql_perf_read_only_1_threads_throughput	15426.655
test_mysql_perf_read_only_8_threads_throughput	91843.455
test_mysql_perf_read_only_16_threads_throughput	147118.590
test_mysql_perf_read_only_32_threads_throughput	351930.190
test_mysql_perf_read_only_64_threads_throughput	539516.645
test_mysql_perf_write_only_1_threads_throughput	24729.910
test_mysql_perf_write_only_8_threads_throughput	181264.665
test_mysql_perf_write_only_16_threads_throughput	343255.675
test_mysql_perf_write_only_32_threads_throughput	305638.615
test_mysql_perf_read_write_1_threads_throughput	15765.225
test_mysql_perf_read_write_8_threads_throughput	94510.755
test_mysql_perf_read_write_16_threads_throughput	129366.075
test_mysql_perf_read_write_32_threads_throughput	343008.765
test_mysql_perf_read_write_64_threads_throughput	536177.690

Run on PR 2802 : enable_recovery not set explicitly	vs v1.8
Jenkins build # 1742	NATIVE-AVG
test_mysql_perf_read_only_1_threads_throughput	15333.340
test_mysql_perf_read_only_8_threads_throughput	92465.825
test_mysql_perf_read_only_16_threads_throughput	148105.220
test_mysql_perf_read_only_32_threads_throughput	351063.265
test_mysql_perf_read_only_64_threads_throughput	543092.535
test_mysql_perf_write_only_1_threads_throughput	24281.175
test_mysql_perf_write_only_8_threads_throughput	181233.285
test_mysql_perf_write_only_16_threads_throughput	345580.970
test_mysql_perf_write_only_32_threads_throughput	361273.435
test_mysql_perf_read_write_1_threads_throughput	15642.780
test_mysql_perf_read_write_8_threads_throughput	94180.560
test_mysql_perf_read_write_16_threads_throughput	132547.650
test_mysql_perf_read_write_32_threads_throughput	343140.480
test_mysql_perf_read_write_64_threads_throughput	535599.385

donporter · 2025-05-06T14:48:05Z

Tensorflow Encrypted (last one):

Baseline - Gramine v1.8 Ubuntu 22.04 Kernel 6.2	NATIVE-MED	SGX-MED	DIRECT-MED	SGX-DEG	DIRECT-DEG
test_tf_perf_bert_throughput	11.3	11.011	11.09	2.558	1.858
test_tf_perf_resnet_bs_1_throughput	513.795	252.554	492.336	50.845	4.177
test_tf_perf_resnet_bs_16_throughput	1546.135	1234.636	1494.313	20.147	3.352
test_tf_perf_resnet_bs_512_throughput	1633.193	1594.986	1619.003	2.339	0.869

Run on PR 2802 : enable_recovery = true
Jenkins build # 1725	NATIVE-MED
test_tf_perf_bert_throughput	11.309
test_tf_perf_resnet_bs_1_throughput	514.115
test_tf_perf_resnet_bs_16_throughput	1545.122
test_tf_perf_resnet_bs_512_throughput	1622.118

Run on PR 2802 : enable_recovery = false
Jenkins build # 1730	NATIVE-MED
test_tf_perf_bert_throughput	11.253
test_tf_perf_resnet_bs_1_throughput	513.250
test_tf_perf_resnet_bs_16_throughput	1544.878
test_tf_perf_resnet_bs_512_throughput	1624.490

Run on PR 2802 : enable_recovery not set explicitly
Jenkins build # 1725	NATIVE-MED
test_tf_perf_bert_throughput	11.285
test_tf_perf_resnet_bs_1_throughput	515.421
test_tf_perf_resnet_bs_16_throughput	1543.503
test_tf_perf_resnet_bs_512_throughput	1622.662

mkow

Thanks @donporter!

@vasanth-intel: A few questions:

What are the units of these numbers?
And what's MED and DEG?
What does "vs v1.8" mean? Is this v18_perf / new_perf, v18_perf / new_perf * 100%, v18_perf - new_perf or something else?

And overall, please be more precise with everything, these numbers can mean anything without more information.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @cloudnoize, @jinengandhi-intel, and @ynonflumintel)

vasanth-intel · 2025-05-07T10:41:27Z

Thanks @donporter!

@vasanth-intel: A few questions:

What are the units of these numbers?

And what's MED and DEG?

What does "vs v1.8" mean? Is this v18_perf / new_perf, v18_perf / new_perf * 100%, v18_perf - new_perf or something else?

And overall, please be more precise with everything, these numbers can mean anything without more information.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @cloudnoize, @jinengandhi-intel, and @ynonflumintel)

@mkow Following are my responses for above questions.

What are the units of these numbers?
OpenVino Throughput - FPS (Frames per second)
OpenVino Latency - Measures inference time in ms.
SPECpower - overall ssj_ops/wat (overall server-side workload operations performed per watt of power consumed)
MongoDB - ops_per_sec (Operations per second)
MySQL Throughput - Queries per second
MySQL Latency - ms (milliseconds)
TensorFlow Throughput - images per second

And what's MED and DEG?
MED stands for Median. We run the perf benchmarking command of every test for 10 iterations. 'MED' is the median of the results from these 10 iterations.
DEG stands for degradation. i.e. Gramine SGX degradation when with respect to Linux Native.

What does "vs v1.8" mean? Is this v18_perf / new_perf, v18_perf / new_perf * 100%, v18_perf - new_perf or something else?
How much more or less degradation we see when compared with Gramine v1.8 results. For example, if we see any sheet within PR_2082_Feb_25_Performance_Results.xlsx, we have Gramine v1.8 results as baseline results on the top of the sheet. Following that are results of the feature being tested. 'vs v1.8' column in all the sheet compares the Gramine SGX degradation (SGX-DEG) from the current run (where feature is enabled/disabled) with the Gramine SGX (SGX-DEG) of the baseline.

Hope the above response helps.

mkow

Gramine SGX degradation when with respect to Linux Native.

But what does it mean exactly? Is it a difference? A ratio? Something else?

How much more or less degradation we see when compared with Gramine v1.8 results. For example, if we see any sheet within PR_2082_Feb_25_Performance_Results.xlsx, we have Gramine v1.8 results as baseline results on the top of the sheet. Following that are results of the feature being tested. 'vs v1.8' column in all the sheet compares the Gramine SGX degradation (SGX-DEG) from the current run (where feature is enabled/disabled) with the Gramine SGX (SGX-DEG) of the baseline.

Please read my question again. Also, I don't have MS Excel, I can't open it. Please include all the data you want to quote here.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @cloudnoize, @jinengandhi-intel, and @ynonflumintel)

vasanth-intel · 2025-05-08T12:53:35Z

Gramine SGX degradation when with respect to Linux Native.
But what does it mean exactly? Is it a difference? A ratio? Something else?

How much more or less degradation we see when compared with Gramine v1.8 results. For example, if we see any sheet within PR_2082_Feb_25_Performance_Results.xlsx, we have Gramine v1.8 results as baseline results on the top of the sheet. Following that are results of the feature being tested. 'vs v1.8' column in all the sheet compares the Gramine SGX degradation (SGX-DEG) from the current run (where feature is enabled/disabled) with the Gramine SGX (SGX-DEG) of the baseline.

Please read my question again. Also, I don't have MS Excel, I can't open it. Please include all the data you want to quote here.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @cloudnoize, @jinengandhi-intel, and @ynonflumintel)

@mkow If you refer to the data in above tables, we use below formula to arrive at SGX Degradation.
SGX-DEG = (( NATIVE-MED - SGX-MED) / NATIVE-MED) * 100

mkow

Ah, then all the values in "DEG" columns are missing the percent sign?
Also, that's the unit and meaning of the "vs v1.8" column? You still didn't explain that.

@kailun-qin: This looks good, actually quite too good - if I read it correctly, the degradation is effectively less than the variance of the measurements. Are these tests using a lot of protected files? How do they work? Are they writing to protected files? Maybe they only read inputs from protected files?

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @cloudnoize, @jinengandhi-intel, and @ynonflumintel)

vasanth-intel · 2025-05-28T06:05:17Z

Ah, then all the values in "DEG" columns are missing the percent sign?
Also, that's the unit and meaning of the "vs v1.8" column? You still didn't explain that.

Yes, the values in 'DEG' column are the SGX degradation percentage of the corresponding test with respect to Linux native. 'vs v1.8' column is the difference of SGX degradations between the current run and the Gramine v1.8 baseline.

@kailun-qin: This looks good, actually quite too good - if I read it correctly, the degradation is effectively less than the variance of the measurements. Are these tests using a lot of protected files? How do they work? Are they writing to protected files? Maybe they only read inputs from protected files?

As part of performance benchmarking for this PR, we had run everything that we generally run for Gramine release. This includes some performance tests like MySQL where it tests both read and write operation on an encrypted DB. Here we have tried different configurations like read/write 100k entries and also 500k entries but we didn't see any degradation. We had shared this data with Kailun also and didn't get any requests for further experiments.

kailun-qin

Yeah, I reviewed the manifest configs for the benchmarking tests from Jinen, and they look good to me.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @cloudnoize, @jinengandhi-intel, and @ynonflumintel)

Previously, a fatal error during writes to encrypted files could cause file corruption due to incorrect GMACs and/or encryption keys. To address this, we introduce a file recovery mechanism using a "shadow" recovery file that stores data about to change and a "has-pending-write" flag in the metadata node indicating the start of a write transaction. During file flush, all cached blocks that are about to change are saved to the recovery file in the format of physical node numbers (offsets) plus encrypted block data. Before saving the main file contents, the "has-pending-write" flag is set in the file's metadata node and cleared only when the transaction is complete. If an encrypted file is opened and the "has-pending-write" flag is set, a recovery process starts to revert partial changes using the recovery file, returning to the last known good state. The "shadow" recovery file is cleaned up on file close. This commit adds a new mount parameter `enable_recovery = [true|false]` for encrypted files mounts to optionally enable this feature. We extend the file flush logic of protected files (pf) to include the recovery file dump and the setting/unsetting of the "has-pending-write" flag. We also extend `pf_open()` to make the pf aware of the underlying recovery file managed by LibOS, and to include an optional recovery check and initiate recovery if needed. Additionally, it automatically adapts encrypted files with older metadata formats to the new format for backward compatibility. Signed-off-by: Kailun Qin <[email protected]>

kailun-qin

Reviewed 5 of 27 files at r1, 1 of 27 files at r2, 1 of 18 files at r3, 1 of 3 files at r4, 2 of 4 files at r5, 1 of 1 files at r6, 1 of 8 files at r7, 7 of 7 files at r9, all commit messages.
Reviewable status: 20 of 34 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @cloudnoize, @efu39, @jinengandhi-intel, @mkow, and @ynonflumintel)

mkow

Reviewed 14 of 14 files at r10, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @cloudnoize and @ynonflumintel)

mkow

Dismissed @cloudnoize and @ynonflumintel from 4 discussions.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @kailun-qin)

mkow · 2025-06-03T15:03:15Z

common/src/protected_files/protected_files_format.h

@@ -56,6 +56,7 @@ typedef struct {
    uint8_t    minor_version;
    pf_nonce_t metadata_key_nonce;
    pf_mac_t   metadata_mac; /* GCM mac */
+    uint8_t    has_pending_write; /* flag for file recovery */


FYI: I dismissed @cloudnoize block here, as they are not a maintainer and are not responding.

mkow · 2025-06-03T15:03:16Z

libos/include/libos_fs.h

+
+    /* Whether to enable file recovery (used by `chroot_encrypted` filesystem), false if not
+     * applicable */
+    bool enable_recovery;


FYI: I dismissed @ynonflumintel block here, as they are not a maintainer and are not responding.

mkow · 2025-06-03T15:03:16Z

libos/include/libos_fs_encrypted.h

 *
 * `uri` must not correspond to an existing file.
 *
 * The newly created `libos_encrypted_file` object will have `use_count` set to 1.
 */
 int encrypted_file_create(const char* uri, mode_t perm, struct libos_encrypted_files_key* key,
-                          struct libos_encrypted_file** out_enc);
+                          bool enable_recovery, struct libos_encrypted_file** out_enc);


FYI: I dismissed @ynonflumintel block here, as they are not a maintainer and are not responding.

mkow · 2025-06-03T15:03:16Z

pal/src/host/linux-common/file_utils.c

+    }
+
+    for (size_t i = 0; i < nodes_count; i++) {
+        ret = read_all(recovery_file_fd, recovery_node, recovery_node_size);


FYI: I dismissed @ynonflumintel block here, as they are not a maintainer and are not responding.

mkow

Reviewable status: complete! all files reviewed, all discussions resolved

ynonflumintel reviewed Jan 9, 2025

View reviewed changes

efu39 reviewed Jan 17, 2025

View reviewed changes

mkow reviewed Feb 11, 2025

View reviewed changes

kailun-qin force-pushed the kailun-qin/add-encrypted-file-recovery branch 2 times, most recently from 00a90f3 to 4f28995 Compare February 11, 2025 11:50

kailun-qin commented Feb 11, 2025

View reviewed changes

kailun-qin force-pushed the kailun-qin/add-encrypted-file-recovery branch 2 times, most recently from 0eb5ec3 to 9d29158 Compare February 11, 2025 12:34

mkow reviewed Feb 11, 2025

View reviewed changes

mkow reviewed Feb 12, 2025

View reviewed changes

kailun-qin changed the title ~~[PAL,LibOS,common] Add file recovery support for encrypted files~~ [LibOS,common] Add file recovery support for encrypted files Feb 13, 2025

kailun-qin commented Feb 13, 2025

View reviewed changes

kailun-qin force-pushed the kailun-qin/add-encrypted-file-recovery branch from fbd5c2a to 9b203b7 Compare February 13, 2025 09:20

mkow reviewed Feb 14, 2025

View reviewed changes

kailun-qin commented Feb 14, 2025

View reviewed changes

mkow reviewed Feb 14, 2025

View reviewed changes

kailun-qin marked this pull request as ready for review February 17, 2025 06:03

kailun-qin commented Feb 17, 2025

View reviewed changes

mkow requested changes Feb 17, 2025

View reviewed changes

efu39 reviewed Feb 26, 2025

View reviewed changes

kailun-qin commented Mar 7, 2025

View reviewed changes

mkow requested changes Mar 8, 2025

View reviewed changes

kailun-qin commented Mar 10, 2025

View reviewed changes

mkow requested changes Mar 10, 2025

View reviewed changes

kailun-qin commented Mar 10, 2025

View reviewed changes

mkow reviewed Mar 10, 2025

View reviewed changes

cloudnoize reviewed Mar 11, 2025

View reviewed changes

mkow reviewed Mar 18, 2025

View reviewed changes

kailun-qin commented Apr 13, 2025

View reviewed changes

kailun-qin commented Apr 14, 2025

View reviewed changes

mkow reviewed Apr 15, 2025

View reviewed changes

mkow reviewed May 6, 2025

View reviewed changes

mkow reviewed May 7, 2025

View reviewed changes

mkow reviewed May 20, 2025

View reviewed changes

kailun-qin commented Jun 3, 2025

View reviewed changes

kailun-qin force-pushed the kailun-qin/add-encrypted-file-recovery branch from 5632109 to 2a41e06 Compare June 3, 2025 14:26

kailun-qin force-pushed the kailun-qin/add-encrypted-file-recovery branch from 2a41e06 to 6808373 Compare June 3, 2025 14:27

kailun-qin commented Jun 3, 2025

View reviewed changes

mkow approved these changes Jun 3, 2025

View reviewed changes

kailun-qin merged commit 6808373 into gramineproject:master Jun 3, 2025
27 checks passed

	if (!dent->inode) {
	if (!(flags & O_CREAT)) {
	ret = -ENOENT;
	goto out;
	}

[LibOS,common] Add file recovery support for encrypted files #2082

[LibOS,common] Add file recovery support for encrypted files #2082

Uh oh!

Conversation

kailun-qin commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the changes

How to test this PR?

Uh oh!

kailun-qin commented Jan 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ynonflumintel Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ynonflumintel Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

efu39 left a comment

Choose a reason for hiding this comment

Uh oh!

mkow left a comment

Choose a reason for hiding this comment

Uh oh!

kailun-qin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkow left a comment

Choose a reason for hiding this comment

Uh oh!

mkow left a comment

Choose a reason for hiding this comment

Uh oh!

kailun-qin left a comment

Choose a reason for hiding this comment

Uh oh!

mkow left a comment

Choose a reason for hiding this comment

Uh oh!

kailun-qin left a comment

Choose a reason for hiding this comment

Uh oh!

mkow left a comment

Choose a reason for hiding this comment

Uh oh!

mkow left a comment

Choose a reason for hiding this comment

kailun-qin commented Jan 7, 2025 •

edited

Loading

ynonflumintel Jan 9, 2025 •

edited

Loading

ynonflumintel Jan 13, 2025 •

edited

Loading

cloudnoize Mar 11, 2025 •

edited

Loading