Skip to content

Ds08msa internal v0.4 #619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: dasharo-24.02.1
Choose a base branch
from
Open

Conversation

pietrushnic
Copy link
Contributor

@pietrushnic pietrushnic commented Feb 22, 2025

Rebased and moved to new branch name from #573

We have to decide what changes here make sense. I already wrote a justification for the use of Dasharo SDK here: Dasharo/dasharo-sdk#9

In the long run, I hope we can replace dasharo-sdk with something like stageX.

@pietrushnic pietrushnic requested a review from miczyg1 February 22, 2025 12:48
@pietrushnic
Copy link
Contributor Author

pietrushnic commented Feb 22, 2025

@miczyg1 @SergiiDmytruk I get merge conflict because of CONFIG_EDK2_TAG_OR_REV change. I'm unsure if I can afford to use untested ref for training; I prefer not. My preference is to use v0.9.0 tag.

What is our policy in such a situation? In theory, this PR should not be reviewed because of a merge conflict.

EDIT: now I see: 7dbfe58ba5dc08e07c253e53b5c1bfed7758ddf6 is not a valid git reference but it was during the Odroid v0.9.0 release.
EDIT2: this is a problem with my fancy configuration with EDKII local path

@SergiiDmytruk
Copy link
Member

I get merge conflict because of CONFIG_EDK2_TAG_OR_REV change.

I think all boards are updated at the same time largely because not doing it can cause a board to fail building as a result EDK2-related changes in coreboot.

I'm unsure if I can afford to use untested ref for training; I prefer not. My preference is to use v0.9.0 tag.

Tag will use an older revision anyway, so I don't really see a problem: tag uses tested EDK, dasharo uses latest EDK.

@pietrushnic
Copy link
Contributor Author

I think all boards are updated at the same time largely because not doing it can cause a board to fail building as a result EDK2-related changes in coreboot.

Understood. I probably should better split this PR between what I relay on for training vs what could be useful for others. Obviously my work cannot be proven by reproducibility because we don't build timeless and hash of current commit is taken into build.

So what I'm really looking is advise how this PR should be submitted, because first I should not request review if I have merge request (we know how it would end), and second I would like not to broke other uses cases. AFAIK we don't have any tests to confirm if this script keep working as expected.

Tag will use an older revision anyway, so I don't really see a problem: tag uses tested EDK, dasharo uses latest EDK.

You're right somehow I though that was hash before rebase, but it was my mistake instead of hash which disapeared due to rebase. Sorry for the noise.

@miczyg1
Copy link
Contributor

miczyg1 commented Feb 24, 2025

Obviously my work cannot be proven by reproducibility because we don't build timeless and hash of current commit is taken into build.

Why? If we build from the same commit using the same SDK, it should give the same result. Why should we care if N code revisions/commits give the same hash or not? To me a different commit means different code and shouldn't give the same hash.

@SergiiDmytruk
Copy link
Member

So what I'm really looking is advise how this PR should be submitted, because first I should not request review if I have merge request (we know how it would end), and second I would like not to broke other uses cases. AFAIK we don't have any tests to confirm if this script keep working as expected.

I'd just rebase this branch onto dasharo. build.sh is used by CI to build Protectli boards. You're not very likely to break anything with the current copy&paste style of the script.

@pietrushnic
Copy link
Contributor Author

pietrushnic commented Feb 24, 2025

Why? If we build from the same commit using the same SDK, it should give the same result. Why should we care if N code revisions/commits give the same hash or not? To me a different commit means different code and shouldn't give the same hash.

For years, I've disliked coreboot-sdk and how it impacts the whole building story in the long run, including the reproducibility of older builds. We essentially can't achieve this with the coreboot build environment: https://reproducible-builds.org/docs/perimeter/
This has to be changed.

For various reasons, I cannot build from the same SDK:

  • coreboot-sdk:2024-02-18_732134932b has 7.46GB - this is too much when I have to use it for a reproducible training environment - here I made some steps to minimize it, and I think we can create even more minimalistic toolchain
  • coreboot-sdk is not reproducible, dasharo-sdk also, but dasharo-sdk would be an abstraction which we replace with stagex (or something better if there will be a candidate)

For various reasons, I may have issues with building from the same commit - dasharo-pq.

The code here makes no change to Odroid-H4+. I changed only the toolchain (not really since I still rely on the same compiler from coreboot-sdk 2024-02-18_732134932b). I'm getting a different hash because we consider the hash of a commit something that should be part of the final binary instead of using BUILD_TIMELESS.

Maybe this is not important to you; maybe my use cases are niche and unnecessary, but this PR contains a couple of things that help me:

  • speed up the build process by not fetching edk2 but having copy locally,
  • no Internet access is needed if things are correctly prepared repository TBH, I need two things: check iPXE and correctly provide a patch for EDKII before entering --network none in my repo,
  • I can use DASHARO_SDK, which I like for a given build,

Some of those features have generic value for the whole project, so I'm asking whether and how I could contribute to those changes. Or, more directly, if needed, how to split this PR so some of those required features can be merged without affecting everything else (hence the lack of testability in this area of code?).

EDIT: I edited airgap requirement

@pietrushnic
Copy link
Contributor Author

@miczyg1 @SergiiDmytruk ?

@miczyg1
Copy link
Contributor

miczyg1 commented Mar 3, 2025

@pietrushnic This script build.sh is for mere mortals, not devs. It is supposed to make building simple and robust. That is why:

  1. We CAN'T change distclean to clean, it is not safe. Distclean is required for switching boards. Imagine somebody builds board X then builds board Y from the same coreboot directory. It will lead to unexpected problems and misconfiguration. Clean can only be used when not switching boards and ODROID H4 isn't the only board supported by us. So you will have to keep distclean, but download payload source each time when build.sh is called.
  2. We CAN change to dasharo-sdk, because it has all the xgccs we need. No problem.
  3. We CAN check out the payloads' sources before starting the build and after creating full config with make olddefconfig. Doesn't really matter if the final make does it, or it is done before that. So generally --network none is fine.

The code here makes no change to Odroid-H4+. I changed only the toolchain (not really since I still rely on the same compiler from coreboot-sdk 2024-02-18_732134932b). I'm getting a different hash because we consider the hash of a commit something that should be part of the final binary instead of using BUILD_TIMELESS.

And how BUILD_TIMELESS helps if you change the SDK in build.sh? It will produce a different binary from the same code, because the toolchain was different, but it went undetected, because of BUILD_TIMELESS. Building with build.sh and without BUILD_TIMELESS ensures that everybody build the binaries the same way we do. We also avoid entering a support hell where everybody has a different way of building binaries.

@pietrushnic
Copy link
Contributor Author

We CAN'T change distclean to clean, it is not safe. Distclean is required for switching boards. Imagine somebody builds board X then builds board Y from the same coreboot directory. It will lead to unexpected problems and misconfiguration. Clean can only be used when not switching boards and ODROID H4 isn't the only board supported by us. So you will have to keep distclean, but download payload source each time when build.sh is called.

Fair point. I can do that modification. I can make it transparent for the user by:

  1. Cloning edk2 in build script - this can be done by parsing config files or introducing checkout step to edk2 makefiles
  2. checkout ipxe with the network before doing the airgap build

Is that acceptable?

And how BUILD_TIMELESS helps if you change the SDK in build.sh? It will produce a different binary from the same code, because the toolchain was different, but it went undetected, because of BUILD_TIMELESS.

To clarify:

  • The toolchain is not different. I'm using the same coreboot-sdk, but instead of taking the container, I'm just extracting what is used. So, GCC and other toolchain components are used in the build without dasharo-sdk.
  • The built environment should not have an impact on the build result. That would be the ideal situation. It is not impossible. If the build environment impacts build results, we should have a reproducible build environment. If the build environment impacts build results and we cannot reproduce the build environment, the design is broken.
  • Also, Martin Roth proved at some point that multiple toolchain versions generate the same binary hash, and a new toolchain can build old code. Yes, this is luck and not something we should rely on. It is documented in leadership notes, IIRC. This is not an argument, just a point such things are possible and maybe even more common than we suspect.

To the point how BUILD_TIMELESS helps, lack of it does not help:

  • Right now, because of not using BUILD_TIMELESS changes around the toolchain cannot be verified because it does not matter if we use the same toolchain or not, and no matter if it introduces a change to the build result, we get a different build result hash because git hash is in binary. So, we cannot verify if the changes introduced are because of toolchain changes.

We could also consider how the source version control system is part of the build environment. It is no longer part of the build environment source code that should be able to exist with or without SVC. The fact that the build environment is mangled with SVC is a terrible design, leading to issues described above, such as the same code with the same compiler giving you different build results. Please consider how crazy it is that zero change in code gives different build results because even git amend with no change to code would lead to a change in build result hash. This is very fragile.

I understand this can be out of the scope of the build.sh, and that is fine. I also know that those problems may not be important to Dasharo developers with different issues. I'm building awareness and a path for my use cases, not being the enemy of the primary use case but being able to coexist.

Building with build.sh and without BUILD_TIMELESS ensures that everybody build the binaries the same way we do. We also avoid entering a support hell where everybody has a different way of building binaries.

We already have situations where not everyone builds the binaries similarly, but let's leave it alone. You are correct that this is most common and adopted for most Dasharo-supported hardware build processes. We can also agree that if it doesn't work for me, it is my problem, so Dasharo developers should not get more work from my use cases (training, dasharo-pq). That's why I'm asking for feedback on how to achieve that in a non-interfered way. I could create my build.sh? I need that solution upstream because I build the whole infrastructure on that (PET and dasharo-pq).

@miczyg1
Copy link
Contributor

miczyg1 commented Mar 3, 2025

We CAN'T change distclean to clean, it is not safe. Distclean is required for switching boards. Imagine somebody builds board X then builds board Y from the same coreboot directory. It will lead to unexpected problems and misconfiguration. Clean can only be used when not switching boards and ODROID H4 isn't the only board supported by us. So you will have to keep distclean, but download payload source each time when build.sh is called.

Fair point. I can do that modification. I can make it transparent for the user by:

  1. Cloning edk2 in build script - this can be done by parsing config files or introducing checkout step to edk2 makefiles

It would simply need make -C payloads/external/edk2 payloads/external/edk2/workspace/Dasharo or make -C payloads/external/edk2 /home/coreboot/coreboot/payloads/external/edk2/workspace/Dasharo (not sure what CURDIR result will be). I don't think you have to modify Makefile. Also make -C payloads/external/edk2 payloads/external/edk2/workspace/edk2-platforms or make -C payloads/external/edk2 /home/coreboot/coreboot/payloads/external/edk2/workspace/edk2-platforms if edk2-platforms is used (can be obtained from config too)

  1. checkout ipxe with the network before doing the airgap build

Is that acceptable?

Yes.

And how BUILD_TIMELESS helps if you change the SDK in build.sh? It will produce a different binary from the same code, because the toolchain was different, but it went undetected, because of BUILD_TIMELESS.

To clarify:

  • The toolchain is not different. I'm using the same coreboot-sdk, but instead of taking the container, I'm just extracting what is used. So, GCC and other toolchain components are used in the build without dasharo-sdk.

Not in this particular case, but I was saying rather hypothetically.

  • The built environment should not have an impact on the build result. That would be the ideal situation. It is not impossible. If the build environment impacts build results, we should have a reproducible build environment. If the build environment impacts build results and we cannot reproduce the build environment, the design is broken.

I'm not an GCC expert, but newer GCC version may have different features, optimization, security fixes which may impact how final binary looks like after compilation. That's what I am trying to say.

  • Also, Martin Roth proved at some point that multiple toolchain versions generate the same binary hash, and a new toolchain can build old code. Yes, this is luck and not something we should rely on. It is documented in leadership notes, IIRC. This is not an argument, just a point such things are possible and maybe even more common than we suspect.

I can't agree with and a new toolchain can build old code.. Maybe it will build coreboot, but it is easy to break the build of a payload with newer toolchain.

To the point how BUILD_TIMELESS helps, lack of it does not help:

  • Right now, because of not using BUILD_TIMELESS changes around the toolchain cannot be verified because it does not matter if we use the same toolchain or not, and no matter if it introduces a change to the build result, we get a different build result hash because git hash is in binary. So, we cannot verify if the changes introduced are because of toolchain changes.

Why? You can do BUILD_TIMELESS using any toolchain you want and have that binary as a reference, then do whatever you want with toolchain, use BUILD_TIMELESS again and compare the results. You are not forced to use build.sh, manually invoking commands will work too. As simple as that.

We could also consider how the source version control system is part of the build environment. It is no longer part of the build environment source code that should be able to exist with or without SVC. The fact that the build environment is mangled with SVC is a terrible design, leading to issues described above, such as the same code with the same compiler giving you different build results. Please consider how crazy it is that zero change in code gives different build results because even git amend with no change to code would lead to a change in build result hash. This is very fragile.

But at least the BIOS release date in SMBIOS is correct :)

We already have situations where not everyone builds the binaries similarly, but let's leave it alone. You are correct that this is most common and adopted for most Dasharo-supported hardware build processes. We can also agree that if it doesn't work for me, it is my problem, so Dasharo developers should not get more work from my use cases (training, dasharo-pq). That's why I'm asking for feedback on how to achieve that in a non-interfered way. I could create my build.sh? I need that solution upstream because I build the whole infrastructure on that (PET and dasharo-pq).

Yes, not everyone builds the binaries the same way. But we can conclude it with one sentence "please use build.sh" and the end of topic.

If you need the BUILD_TIMELESS, then add it to the script or make the script pass the environments BUILD_TIMELESS to the docker container. I'm fine with that. However, no matter what we do, we will not fix the past, i.e. no way to get the same hash as release binary if we modify anything in the tree (build.sh including). Thus as I proposed earlier, you can always build the reference with BUILD_TIMELESS by manually pasting commands. Then re-check against this reference after any modifications you make.

@pietrushnic
Copy link
Contributor Author

It would simply need make -C payloads/external/edk2 payloads/external/edk2/workspace/Dasharo or make -C payloads/external/edk2 /home/coreboot/coreboot/payloads/external/edk2/workspace/Dasharo (not sure what CURDIR result will be). I don't think you have to modify Makefile. Also make -C payloads/external/edk2 payloads/external/edk2/workspace/edk2-platforms or make -C payloads/external/edk2 /home/coreboot/coreboot/payloads/external/edk2/workspace/edk2-platforms if edk2-platforms is used (can be obtained from config too)

Thanks. I will add it.

I can't agree with and a new toolchain can build old code.. Maybe it will build coreboot, but it is easy to break the build of a payload with newer toolchain.

This was probably tested with the QEMU target with coreboot. Toolchain often breaks with our EDKII, which is mainly our issue, not upstream. If iPXE breaks, then upstream fix that as part of the bump of iPXE. Not sure about other payloads. However, this proves that the toolchain has a problem because it has to build a coreboot and many different projects.

As I said, this is not an argument, so there is no need to rebut.

Why? You can do BUILD_TIMELESS using any toolchain you want and have that binary as a reference, then do whatever you want with toolchain, use BUILD_TIMELESS again and compare the results. You are not forced to use build.sh, manually invoking commands will work too. As simple as that.

Of course, I ended up with more work on explaining all the necessary details to students and dealing with the support of it live during the class. Also, my CI/CD related to dasharo-pq and PET have to be coded differently than what is used by most developers. I don't say it is not possible. It is, but I have to give up the infrastructure used by all other developers and do some more work. Before that, I wanted to try this path. I understand you say BUILD_TIMELESS is not a good fit for build.sh. That is fine. I will try to find another way. I have a gut feeling this discussion will get back to us sooner or later in another form, but maybe right now, let's give it up.

If you need the BUILD_TIMELESS, then add it to the script or make the script pass the environments BUILD_TIMELESS to the docker container. I'm fine with that. However, no matter what we do, we will not fix the past, i.e. no way to get the same hash as release binary if we modify anything in the tree (build.sh including). Thus as I proposed earlier, you can always build the reference with BUILD_TIMELESS by manually pasting commands. Then re-check against this reference after any modifications you make.

Yes, I will follow both:

  • I will extend help to explain additional environment variables considered during the build or add parameters (airgap, DASHARO_SDK, BUILD_TIMELESS) so the interface will be backward compatible.
  • I will compare only BUILD_TIMELESS hashes before and after changes to confirm that infrastructure changes didn't affect build results.

@pietrushnic pietrushnic force-pushed the ds08msa_internal_v0.4 branch 3 times, most recently from c0b1a7a to f902cda Compare March 3, 2025 22:02
@pietrushnic pietrushnic force-pushed the ds08msa_internal_v0.4 branch from 463d406 to 14dfd52 Compare March 24, 2025 23:02
@pietrushnic pietrushnic force-pushed the ds08msa_internal_v0.4 branch from 14dfd52 to 9d6d9c2 Compare March 25, 2025 22:48
@pietrushnic
Copy link
Contributor Author

@miczyg1 I simplified the whole change into three things:

  1. Switch to using DASHARO_SDK for all platforms: 7aafa91 - it is based on the same coreboot sdk, so there should be no issues.
  2. Through AIRGAP flag introduce airgap mode, which I explained in commit message: 9d6d9c2
  3. Add support for building odroid and its btg version: bb1b9a0

Please let me know if this is better and match Dasharo needs.

This change look for AIRGAP environment variable and if it is set it
perform airgap build of Dasharo for Odroid H4 and its version for
Intel Boot Guard.

This is required for security, privacy and trainers who would like to
perform 100% offline build.

To make that possible couple requirements have to be fulfilled:
- repository cannot be distcleaned, because it remove all artifacts, the
  assumption is that provided repository already has all dependencies
  fetched, so only make clean is made before proceeding
- since whole process rely on mounting edk2 as volume inside Dasharo SDK
  container, workspace directory to which it would be mount needs proper
  permissions otherwise docker will create mountpoint with root
  privileges, what cause issues in further use and build process
- finally we take into consideration BUILD_TIMELESS environment
  variable, which improve testability of build process and toolchain
  change

This change was tested by:
1. cloning relevant version of edk2
2. cloning coreboot, cd coreboot
3. running checkout on ipxe:

docker run --rm --user $(id -u):$(id -g) -v $PWD:/home/coreboot/coreboot \
  ${DASHARO_SDK} \
  make -C /home/coreboot/coreboot/payloads/external/iPXE checkout

4. Build

EDK2_REPO_PATH="${PWD}/../edk2" AIRGAP=1 BUILD_TIMELESS=1 ./build.sh odroid_h4_btg
EDK2_REPO_PATH="${PWD}/../edk2" AIRGAP=1 BUILD_TIMELESS=1 ./build.sh odroid_h4

Signed-off-by: Piotr Król <[email protected]>
@pietrushnic pietrushnic force-pushed the ds08msa_internal_v0.4 branch from bb1b9a0 to 1e016c4 Compare March 26, 2025 13:12
@miczyg1
Copy link
Contributor

miczyg1 commented Mar 27, 2025

@miczyg1 I simplified the whole change into three things:

  1. Switch to using DASHARO_SDK for all platforms: 7aafa91 - it is based on the same coreboot sdk, so there should be no issues.

We cannot use latest, because it is introducing a moving target of the SDK. We have to tag it first.

  1. Through AIRGAP flag introduce airgap mode, which I explained in commit message: 9d6d9c2
  2. Add support for building odroid and its btg version: bb1b9a0

Please let me know if this is better and match Dasharo needs.

@@ -30,7 +30,8 @@ usage() {
}

SDKVER="2024-02-18_732134932b"

DASHARO_SDK=${DASHARO_SDK:-"ghcr.io/dasharo/dasharo-sdk:latest"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should point to a stable tag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I assume we can merge and make v1.6.0 release Dasharo/dasharo-sdk#11 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it has been approved long time ago. Feel free to merge and tag it @pietrushnic

-v $HOME/.ssh:/home/coreboot/.ssh \
-w /home/coreboot/coreboot coreboot/coreboot-sdk:$SDKVER \
/bin/bash -c "make distclean"
if [ "${AIRGAP}" -eq 1 ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be applied to all boards, not just selected ones. Mayeb it would also be worth to separate it into a function to reduce the LOC and overall boilerplate.

Comment on lines +266 to +295
if [ "${AIRGAP}" -eq 1 ]; then

# In this situation we assume that provided repository is ready to be used
# and nothing should be downloaded during build process.

if [ -d "${EDK2_REPO_PATH}" ]; then
# Without following sequence workspce would be created by docker with root
# privilidges and build will fail.
# Target directory
TARGET_DIR="payloads/external/edk2/workspace/Dasharo"
mkdir -p "$TARGET_DIR"
chown -R $(id -u):$(id -g) "$TARGET_DIR"
chmod -R 755 "$TARGET_DIR"
docker run --rm -t -u $UID -v $PWD:/home/coreboot/coreboot \
-v $HOME/.ssh:/home/coreboot/.ssh \
--network none \
${EDK2_REPO_PATH:+-v $EDK2_REPO_PATH:/home/coreboot/coreboot/payloads/external/edk2/workspace/Dasharo} \
-e BUILD_TIMELESS=${BUILD_TIMELESS} \
-w /home/coreboot/coreboot ${DASHARO_SDK} \
/bin/bash -c "make olddefconfig && make -j$(nproc)"
else
echo "EDK2_REPO_PATH is not defined in AIRGAP!"
exit 1
fi
else
docker run --rm -t -u $UID -v $PWD:/home/coreboot/coreboot \
-v $HOME/.ssh:/home/coreboot/.ssh \
-w /home/coreboot/coreboot ${DASHARO_SDK} \
/bin/bash -c "make olddefconfig && make -j$(nproc)"
fi
Copy link
Contributor

@miczyg1 miczyg1 Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be applied to all boards, not just selected ones. Maybe it would also be worth to separate it into a function to reduce the LOC and overall boilerplate. I think the docker commands to start the build are the same for every board.

Copy link
Contributor

@miczyg1 miczyg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script could take some small improvements

@miczyg1
Copy link
Contributor

miczyg1 commented Mar 27, 2025

@pietrushnic since you change the SDK in build.sh, it would be wise to change it in GH action workflows as well. That way CI will build test everything using your new SDK

@miczyg1 miczyg1 changed the base branch from dasharo to dasharo-24.02.1 March 28, 2025 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants