Skip to content

HOTP asked to be resealed even if TOTP good (Picks up on a reinstalled OS even if firmware measurements haven't changed) #1935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

tlaurion
Copy link
Collaborator

@tlaurion tlaurion commented Mar 17, 2025

Fixes #1562
Supeseeds #1934 + reviewed changes


@marmarek/ @JonathonHall-Purism : good enough for merge?
EDIT: Thanks @JonathonHall-Purism for your review. All comments addressed.


Changes:

  • No more multi-console 3 attempts of TPMTOTP unsealing for race condition management when simultaneous TPM usage (introduced on Talos-2 port since dual console output on BMC and display consoles) : now if TPM unseal fails, we die
    • Now die() asks to press Enter key, which is clearer for UX to understand what fails
  • Add of DEBUG + TRACE_FUNC calls and call stack traces (this is 280cb1f which is PR DEBUG+TRACE mode: provide a complete call stack trace on console / debug.log #1934)
  • Bumps hotp-verification to 1.7+ unreleased fixes under 1.71/1.8 so that hotp_verification output is on multi-lines even if hiccups on physical presence detection
  • Clarifies TPM counter creation + increment (I had bad time understanding why things were not working. Easier to understand and debug if needed in the future, easy to understand if board is in DEBUG+TRACE mode)
    • Added TRACE_FUNC (which now outputs call hierarchy) along the way so that when in DEBUG+TRACE mode, call trace is now easy to follow for the future.
  • ident to tabs on all code reviewed.
  • When going to recovery shell: guide user into how to provide logs

@tlaurion
Copy link
Collaborator Author

tlaurion commented Mar 18, 2025

@marmarek

This is ./docker_repro.sh make BOARD=qemu-coreboot-fbwhiptail-tpm2-hotp-prod_quiet USB_TOKEN=Nitrokey3NFC PUBKEY_ASC=pubkey.asc inject_gpg run with nk3 passed to testing qube.

Simulating OS reinstallation (wiping /boot/kexec*)

So same firmware, meaning:

  • Same firmware, with public key fused in rom and measured. So TPMTOTP good. Should warn that HOTP counter doesn't exist and prompt for HOTP reseal.
  • Let's wipe /boot/kexec* files:
    2025-03-17-200019
  • Reboot. And Then:
    2025-03-17-200115
    2025-03-17-200208
  • Then selection non-existing default boot should pick up and guide user into signing and selecting default:
    2025-03-17-200244
    2025-03-17-200319
    2025-03-17-200408
  • And then finally ask if TPM DUK must be created (optional):
    2025-03-17-195901

@JonathonHall-Purism

Other changes

  • Stubborn users not following on screen instructions still are reminded that they haven't followed instructions. die() now requires to "Press a any key:
    2025-03-17-201600
    2025-03-17-201614
    2025-03-17-201632
    2025-03-17-201655
    2025-03-17-201729
    2025-03-17-201755
  • rest as usual.

@tlaurion tlaurion marked this pull request as draft March 18, 2025 00:21
@tlaurion tlaurion changed the title WiP - HOTP asked to be resealed even if TOTP good (Picks up on a reinstalled OS even if firmware measurements haven't changed) HOTP asked to be resealed even if TOTP good (Picks up on a reinstalled OS even if firmware measurements haven't changed) Mar 18, 2025
@tlaurion tlaurion marked this pull request as ready for review March 18, 2025 00:21
@tlaurion
Copy link
Collaborator Author

@JonathonHall-Purism in debug.

This is ./docker_repro.sh make BOARD=qemu-coreboot-whiptail-tpm2-hotp USB_TOKEN=Nitrokey3NFC PUBKEY_ASC=pubkey.asc inject_gpg run

To show 280cb1f

  • User does reseal TPMTOTP/HOTP instead of reset TPM:
    2025-03-17-202616
    2025-03-17-202819

So now user is locked in into following steps:

  • Reset TPM, which reseals TPM/HOTP + updates checksums and sign /boot
  • Be happy (user guided to do the right thing

Full debug log of the combined steps (user doing Reset TPM as asked up to setting TPM DUK) with way more interesting call stack for everyone to understand:
full_debug_trace-from_tpm_reset-to_TPM_DUK.txt

Note:
For those who doesn't own a NK3/Librem Key : use non-hotp variants. qemu boards enforce canokey for GPG OpenPGP smartcard operations, follow targets/qemu.md, starting with ./docker_repro.sh make BOARD=qemu-coreboot-fbwhiptail-tpm2. You can learn Heads, or develop/contribute by running non prod_quiet versions, which will output in TRACE+DEBUG mode.

@marmarek
Copy link
Contributor

To be clear - previously after OS reinstall (including removal of /boot/kexec* files) the recommendation was to do full TPM reset, and now just resealing the HOTP secret should work, right?

@@ -29,7 +29,13 @@ mount_boot_or_die
#counter_value=$(read_tpm_counter $counter | cut -f2 -d ' ' | awk 'gsub("^000e","")')
#

counter_value=$(cat $HOTP_COUNTER)
#if HOTP_COUNTER is not present, bail out
if [ ! -f $HOTP_COUNTER ]; then
Copy link
Collaborator Author

@tlaurion tlaurion Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marmarek the issue was that previously, if TOTPM previously sealed/unsealed (measured boot from coreboot+heads), Heads was not looking to see if HOTP counter under /boot/kexec_hotp_counter was still present.

@@ -280,7 +269,10 @@ update_hotp() {
HOTP='N/A'
fi

if [[ "$CONFIG_TPM" = n && "$HOTP" = "Invalid code" ]]; then
if [[ "$HOTP" = "Invalid code" ]]; then
Copy link
Collaborator Author

@tlaurion tlaurion Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check only verified if HOTP was invalid if no TPM was in use.

So now, if there is no /boot/kexec_hotp_counter and TPMTOTP can unseal, user is promoted to reseal HOTP alone (OS reinstall use case without firmware upgrade)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what is we have kexec_rollback.txt? All of this doesn't make any sense: user should reset TPM here of more logic needs to be refactored.

Copy link
Collaborator Author

@tlaurion tlaurion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marmarek added comments in code: with this PR, HOTP counter not being present will guide user to only reseal HOTP and generate hashes and /boot detached signed digest files, as well as selecting default boot and propose to set TPM DUK.

So TPM reset is not needed anymore in case of OS re-installation for TPM1.
For TPM2, TPM primary handle still needs to be created+hashed, which is TPM Reset is advised for creation and hash creation advised for in output as well.

This is workaround for you issue.

  • #1480 is still the real solution to get rid of TPM2 primary handle (discussion at #1655)
  • Getting rid of HOTP counter issue tracking is #1651

@tlaurion tlaurion force-pushed the hotp_fixup_without_firmware_upgrade_boot_wiped branch from 6159012 to 75e766a Compare March 18, 2025 15:42
Copy link
Collaborator

@JonathonHall-Purism JonathonHall-Purism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tlaurion, strategy looks good to me and I left comments on some of the details 💯

@@ -789,12 +820,17 @@ increment_tpm_counter() {
TRACE_FUNC
tpmr counter_increment -ix "$1" -pwdc '' |
tee /tmp/counter-$1 >/dev/null 2>&1 ||
die "TPM counter increment failed for rollback prevention. Please reset the TPM"
die "TPM counter increment failed for rollback prevention. Please reset the TPM. Press Enter to continue"
#TODO: why the die here needs to say to Press Enter to continue? Should be part of die call?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe because oem-factory-reset has its own die()? It's not equivalent to this one though so you can't just delete it (kills some TOP_PID it captured rather than exiting) 🤔

@tlaurion tlaurion marked this pull request as draft March 25, 2025 16:28
@tlaurion tlaurion force-pushed the hotp_fixup_without_firmware_upgrade_boot_wiped branch 4 times, most recently from 5cfcc73 to 47c696f Compare March 25, 2025 22:25
@@ -719,7 +719,7 @@ tpm1_reset() {
DO_WITH_DEBUG tpm physicalsetdeactivated -c &>/dev/null
DO_WITH_DEBUG tpm forceclear &>/dev/null
DO_WITH_DEBUG tpm physicalenable &>/dev/null
DO_WITH_DEBUG tpm takeown -pwdo "$tpm_owner_password" &>/dev/null
DO_WITH_DEBUG --mask-position 3 tpm takeown -pwdo "$tpm_owner_password" &>/dev/null
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEBUG was exposing TPM owner passphrase on log + debug.log

@tlaurion tlaurion force-pushed the hotp_fixup_without_firmware_upgrade_boot_wiped branch 2 times, most recently from c38c13b to dae9e85 Compare March 25, 2025 22:33
@@ -5,9 +5,9 @@ find /boot/kexec*.txt | gpg --verify /boot/kexec.sig -
#remove invalid kexec_* signed files
mount /dev/sda1 /boot && mount -o remount,rw /boot && rm /boot/kexec* && mount -o remount,ro /boot
#Generate keys on OpenPGP smartcard:
mount-usb && gpg --home=/.gnupg/ --card-edit
mount-usb --mode rw && gpg --home=/.gnupg/ --card-edit
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bash history now promotes mount-usb --mode rw. nitpick

…ives call hierarchy, fix HOTP resealing only on OS reinstall, clarify TPM increment workflow

Signed-off-by: Thierry Laurion <[email protected]>
@tlaurion tlaurion force-pushed the hotp_fixup_without_firmware_upgrade_boot_wiped branch from dae9e85 to 1f6a975 Compare March 26, 2025 03:12
@tlaurion tlaurion marked this pull request as ready for review March 26, 2025 03:17
@@ -183,17 +183,6 @@ update_totp() {
TOTP="NO TPM"
else
TOTP=$(unseal-totp)
# On platforms using CONFIG_BOOT_EXTRA_TTYS multiple processes may try to
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more 3 attempts on boot to unseal TPMTOTP: if multiple consoles (Eg Talos-2 with display console + BMC, at worst we could intruduce small delay if race condition still happening, while die asks user to press Enter now, guiding to reseal TPMTOTP or reset TPM if unable to access TPM NVRAM.

@tlaurion
Copy link
Collaborator Author

tlaurion commented Mar 26, 2025

  • Using it on nv41, wiped /boot/kexec* : simulating OS reinstall: ok
  • Tested under qemu tpm2:
    • Wiped build///vtpm : reseal TPMTOTP/HOTP dies telling user reset tpm needed TPM not owned) : ok
    • Wiped /boot/kexechotp : Reseal HOTP picks up when TPMTOTP is good: tests your reinstall case: ok

…hecksums. Warn user prior of effectively booting (shows console warning, wait 2s then reboot)

Signed-off-by: Thierry Laurion <[email protected]>
@tlaurion

This comment was marked as off-topic.

… prompt for recovery shell access, state where debug logs are in centralized way

Note for linuxboot#1888:
warn in code is used mostly to actually warn user of something requiring his attention, and pausing for 2 seconds.

Goal is:
die: blocking: tell user that something failed, requiring acknowledgement for corrective actions.
warn: display "WARNING:" prepended messages which pauses for 2 seconds prior of continuing. This is not an error, nor INFO
INFO: gives a trace to the user when in QUIET mode, under /tmp/debug.log related to core components output, typically related to measurements traces.

Consequently, putting what is currently under warn->INFO wold be console silenced. We want to get rid of manual "echo +++++" messages.
So it seems we lack what is currently named INFO to go into measurement_log, and INFO (green), warn (yellow) and die (red) messages to console.

Signed-off-by: Thierry Laurion <[email protected]>
Signed-off-by: Thierry Laurion <[email protected]>
… being set: observed in fbwhiptail-tpm2-hotp-prod_quiet

  991 root      3272 S    {gui-init} /bin/bash /bin/gui-init
 2024 root      2792 S    {kexec-select-bo} /bin/bash /bin/kexec-select-boot -
 2025 root      1364 S    sha256sum -c /tmp/kexec/kexec_default_hashes.txt
 2105 root      2068 S    /bin/bash

Signed-off-by: Thierry Laurion <[email protected]>
…. Logs for first under usb.raw to check against HOTP reseal

Signed-off-by: Thierry Laurion <[email protected]>
@JonathonHall-Purism
Copy link
Collaborator

@tlaurion Looked over the recent changes, could you cherry pick this tweak to finish addressing my nitpicks please 🙂 ad807f4

I see you have a few TODOs left to address, it looks OK to me otherwise, let me know when you would like further review of those fixes

tlaurion and others added 8 commits April 28, 2025 14:09
…hrase equiv) + easthetic fixes

Signed-off-by: Thierry Laurion <[email protected]>
…p_without_firmware_upgrade_boot_wiped-staging

Signed-off-by: Thierry Laurion <[email protected]>
Asking to press Enter is more forgiving than "any key" and good, but we
also have to actually continue on Enter instead of any key.

Signed-off-by: Jonathon Hall <[email protected]>
… sleep one second before continuing

Signed-off-by: Thierry Laurion <[email protected]>
…l selected containers prior of prompting for new DUK

Signed-off-by: Thierry Laurion <[email protected]>
…nderstanding and debugging

Signed-off-by: Thierry Laurion <[email protected]>
…d leaves 1 second to the user to read the notice

Signed-off-by: Thierry Laurion <[email protected]>
tlaurion added 2 commits June 23, 2025 15:16
Tested under QEMU
- wipe of /boot/kexec_*
- TPM reset + boot default + define default + TPM DUK
- remove qemu *.rom files (so keyring injected is unique and triggers TPM unseal error on boot)
- Reseal TPMTOTP+HOTP succeeds giving debug output of TPM counter increment succeeding
- comparing hashes under /boot/kexec_rollback.txt validates TPM increment works and is validated (rollback is to prevent copying old kexec*.txt + kexec.sig under /boot)

Signed-off-by: Thierry Laurion <[email protected]>
@tlaurion
Copy link
Collaborator Author

tlaurion commented Jun 23, 2025

#1935 (comment) testing showed:

TODO, fix

@marmarek if you could restest with 1a8d685 I think this is good to merge after testing + @JonathonHall-Purism final code review is needed (die now wait for input; warn waits 1 second; both of which don't require docs changes after reading.
This will most probably require automated testing changes if those exists @macpijan (as can be seen under screenshots at #1935 (comment) and ERROR: messages followed by Press Enter to continue..., if you want to open an issue pointing to testing changes introduced here.

I could clean commit logs once testing is reported fixing everything reported here, (and making console output clear and code clearer and having warnings that can actually be seen, errors that ask user acknowledgement etc, which are all input for now merged first steps of #1888 but I have not much more time to put here after that.

@tlaurion tlaurion requested review from JonathonHall-Purism and removed request for JonathonHall-Purism June 23, 2025 20:13
@marmarek
Copy link
Contributor

Thanks, I'll test it soon, but it will take me some time. That test system got hit with #1882 again :(

@marmarek
Copy link
Contributor

HOTP part seems to be okay now, but signing boot files fails with can't stat '/tmp/kexec/kexec_tree.txt.user': No such file or directory and then prompts for the TPM Owner Password. Or maybe the message is about something else (rollback counter)? It's not clear from the message, but normally I'd expect GPG card pin at this stage...
See here: https://openqa.qubes-os.org/tests/145106#step/firstboot/12 (and the next step)

@tlaurion
Copy link
Collaborator Author

HOTP part seems to be okay now, but signing boot files fails with can't stat '/tmp/kexec/kexec_tree.txt.user': No such file or directory and then prompts for the TPM Owner Password. Or maybe the message is about something else (rollback counter)? It's not clear from the message, but normally I'd expect GPG card pin at this stage... See here: https://openqa.qubes-os.org/tests/145106#step/firstboot/12 (and the next step)

Recontextualization of your use case:
1- reinstall QubesOS on a previously owned laptop (TPM owner password is set)

  • There is no kexec_*.txt files under /boot left by Heads since /boot got wiped per reinstall
    • This also means there is no kexec_tree.txt/kexec_tree.txt.user to compare since no default boot nor saved /boot hashes have been saved before (I could silence this, but aht would still require TPM owner passphrase to set TPM counter, yes.

@marmarek what would you like to see here? I agree the diff/file non existience is not helpful here. But what would be?

…ious tree file exists to be compared against in diff

Signed-off-by: Thierry Laurion <[email protected]>
@marmarek
Copy link
Contributor

Ok, so it seems the issue is about the message - normally the "Missing file" error is about signing /boot content with gpg, and the next step is about signing them (see https://openqa.qubes-os.org/tests/145090#step/firstboot/4 for example). TPM owner password is unexpected/surprising prompt. So, maybe just add one line before that prompt explaining why it's asking for it? Alternatively, change the "Missing file" error to include that info?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to create rollback file after OS reinstall (Regenerate TOTP/HOTP)
3 participants