Skip to content

pytorch with debug flag is getting failed with latest gramine and gsc master with Segmentation fault #87

@anjalirai-intel

Description

@anjalirai-intel

The commit [PAL/{Linux,Linux-SGX}] that added trace logs for raw syscalls) in Gramine introduced a bug. As a result, the PyTorch build with the debug flag is failing with the GSC master branch, leading to a segmentation fault.

This issue occurs with the latest versions of Gramine and GSC. To reproduce it, you need to modify the curation_script.sh because, by default, the contrib repository uses the tagged versions of GSC and Gramine.

Steps to reproduce:

  1. Generate an OpenSSL key: openssl genrsa -3 -out enclave-key.pem 3072
  2. Clone the contrib repository: git clone https://github.com/gramineproject/contrib.git
  3. Apply the changes mentioned below to utils/curation_script.sh
  4. Run base_image_helper.sh
  5. Execute: python3 curate.py pytorch pytorch-encrypted
  6. Provide the signing key path generated in Step 1 and wait for the build to finish
  7. Run the commands from commands.txt

Curation Script Updates:

diff --git a/Intel-Confidential-Compute-for-X/util/curation_script.sh b/Intel-Confidential-Compute-for-X/util/curation_script.sh
index 039813c..81fc3ce 100755
--- a/Intel-Confidential-Compute-for-X/util/curation_script.sh
+++ b/Intel-Confidential-Compute-for-X/util/curation_script.sh
@@ -126,8 +126,9 @@ create_gsc_image () {
     rm -rf gsc >/dev/null 2>&1
     git clone https://github.com/gramineproject/gsc.git
     cd gsc
-    git checkout $(git tag --list 'v*.*' --sort=taggerdate | tail -1)
+    git checkout master
     cp -f config.yaml.template config.yaml
+    sed -i "s/Branch.*master.*\|Branch.*v1.7.*/Branch: 'b6a2d79b641aed7a52220246ad238d241a6fc995'/" config.yaml
     sed -i 's|ubuntu:.*|'$distro'"|' config.yaml
 
     ./gsc build $cmdline_flag --buildtype $1 $app_image_x  $WORKLOAD_DIR/$app_image_manifest

By following these steps, you should be able to reproduce the segmentation fault issue with the Gramine commit and GSC master branch.

Error:

(pal_exception.c:237:handle_ud) trace: Emulating raw syscall instruction with number 202 at address libgomp.so.1+0x1fccc (addr = 0x2cbd3ccc)
(pal_exception.c:237:handle_ud) trace: Emulating raw syscall instruction with number 202 at address libgomp.so.1+0x1fccc (addr = 0x2cbd3ccc)
(pal_exception.c:237:handle_ud) trace: Emulating raw syscall instruction with number 202 at address libgomp.so.1+0x1fccc (addr = 0x2cbd3ccc)
(host_exception.c:141:handle_sync_signal) error: Unexpected segmentation fault (SIGSEGV) occurred inside untrusted PAL (libc-2.31.so+0x9a1fe (addr = 0x7f3b1476a1fe))

This issue is consistently reproducible on Azure system but can be seen intermittently for other workloads on different host. I have attached the debug logs as well.

test_pytorch_default_with_debug_console.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions