Fix handling of invalid object handles #495

bukka · 2024-12-27T14:16:12Z

Description

This is an initial fix for #494 which describes all the details about the actual issue.

The fix first removes the errors in session refreshing if the session was successfully re-opened.

It then exposes function for refreshing of invalid objects - this uses the same logic as after the fork refreshing triggered from store.

And it adds a logic in signature for handling CKR_OBJECT_HANDLE_INVALID error. It basically refreshes the object and repeat the operation. It also clears errors if refreshing and repeated operation is successful.

For my use case signInit is enough but I realise that this would be quite incomplete so other places should be addressed in the similar way. But before I look into it, I wanted to hear if this approach would be acceptable so please let me know your thoughts. Also if acceptable I will be happy to look into providing some tests.

Checklist

Code modified for feature
Test suite updated with functionality tests
Test suite updated with negative tests
Documentation updated

Reviewer's checklist:

Any issues marked for closing are addressed
There is a test suite reasonably covering new functionality or modifications
This feature/change has adequate documentation added
Code conform to coding style that today cannot yet be enforced via the check style test
Commits have short titles and sensible commit messages
Coverity Scan has run if needed (code PR) and no new defects were found

Signed-off-by: Jakub Zelenka <[email protected]>

This is useful for refreshing invalid object handles Signed-off-by: Jakub Zelenka <[email protected]>

Signed-off-by: Jakub Zelenka <[email protected]>

simo5

Sounds reasonable in principle, however this change would have to be added to any "_init()" for any operation.
So all the operation in asymmetric_cpiher.c and exchange.c etc...

simo5 · 2025-01-06T16:33:14Z

src/signature.c

    ret = p11prov_sig_operate_init(sigctx, false, &session);
    if (ret != CKR_OK) {
-        return ret;
+        if (ret == CKR_OBJECT_HANDLE_INVALID && p11prov_obj_refresh_invalid(sigctx->key) == CKR_OK) {


you should move this whole section directly in p11prov_sig_operate_init()

simo5 · 2025-01-06T16:34:47Z

src/signature.c

@@ -870,7 +870,7 @@ static CK_RV p11prov_sig_operate_init(P11PROV_SIG_CTX *sigctx, bool digest_op,
    }

 done:
-    if (ret != CKR_OK) {
+    if (ret != CKR_OK && ret != CKR_OBJECT_HANDLE_INVALID) {


if you do the refreshing in this function, as you should, then this will go away, as it should.
This exception here is a trap for any of the callers.

simo5 · 2025-01-06T16:39:27Z

Note that this kind of change will need tests, or it will regress easily.

bukka · 2025-01-06T20:57:30Z

@simo5 Cool, thanks. Ok I will look into adding that to other _init and make it more generic and will also take into account your comments.

In terms of test I could look into adding an integration test for nginx and pkcs11-proxy which should cover the sign one and would be probably useful for testing the fork reload as nginx forks workers. Would you also want tests for other _init? If so, I would probably have to look into some longer running app so I can restart daemon and then retry it. If you have an idea about some existing application for such testing, that would be great!

simo5 · 2025-01-06T21:07:40Z

@simo5 Cool, thanks. Ok I will look into adding that to other _init and make it more generic and will also take into account your comments.

In terms of test I could look into adding an integration test for nginx and pkcs11-proxy which should cover the sign one and would be probably useful for testing the fork reload as nginx forks workers. Would you also want tests for other _init? If so, I would probably have to look into some longer running app so I can restart daemon and then retry it. If you have an idea about some existing application for such testing, that would be great!

Assuming we can make a generic function for this, then just one test application with the proxy should be fine. Let's see where it goes as you add code.

simo5 · 2025-01-06T21:26:25Z

Just a word of warning, closing sessions means destroying all ephemeral keys created as session objects.
This means some handles will never be recoverable. I think we need a way to destroy or mark those objects as unrecoverable otherwise we may end up trying to recover them over and over in some cases.

bukka · 2025-01-06T21:29:24Z

Ah yeah, good point. I will look into it.

bukka · 2025-03-10T15:27:59Z

So after doing various tests with it, I think this approach is not viable becuase it can result in a situation when sessions can get mixed up. In my case the problem is that when the proxy daemon is restarted it resets all sessions (and all handles). What it means is that the session handles will be indexed from 1 after the restart because that's how SoftHSMv2 session manager works. What might then happen is that it might try to use a handle created before the restart that was in the meantime (after the restart) recreated in another slot so there will be an existing handle available but it will represent a different key. I think it just cannot work with continuous refreshing on an invalid handle and it needs to reset all sessions as it's done after the fork.

What I'm thinking instead is the do a slot refresh if there is a device error (maybe having a config option for it so the current behaviour of trying to continue works fine in case there are device that can still work after the device error). @simo5 Would that be an acceptable approach?

simo5 · 2025-03-10T17:04:15Z

So after doing various tests with it, I think this approach is not viable becuase it can result in a situation when sessions can get mixed up. In my case the problem is that when the proxy daemon is restarted it resets all sessions (and all handles).

Yes this is what I would expect.

What it means is that the session handles will be indexed from 1 after the restart because that's how SoftHSMv2 session manager works. What might then happen is that it might try to use a handle created before the restart that was in the meantime (after the restart) recreated in another slot so there will be an existing handle available but it will represent a different key.

Sessions never represent keys, but sessions can hold ephemeral keys and they would simply lost. Also sessions are not hermetic, if you have a key with a handle you can use it on any sessions, even if the key was created as an ephemeral key o a different sessions, but then keys are lost.

More problematic is that key handles will also be reset, and now you may be using a completely random key, that does not match type or anything. This is why on fork we refresh all handles by re-fetching the keys. Unfortunately this means losing again all ephemeral keys because they are lost when the holding session is closed.

I think it just cannot work with continuous refreshing on an invalid handle and it needs to reset all sessions as it's done after the fork.

Exactly.

What I'm thinking instead is the do a slot refresh if there is a device error (maybe having a config option for it so the current behavior of trying to continue works fine in case there are device that can still work after the device error). @simo5 Would that be an acceptable approach?

I am not sure, I think that in order to have the ability to recover from a token error you may also need to cache any key data that was imported in the session, so that you can restore ephemeral keys that openssl copied in to do some operation. But this will still miss any session key generated directly on the token.. oh well.

I wonder if it wouldn't be better to just put pkcs11-provider in an error state and have the application restart ...

But perhaps it is sufficient to ensure that all the objects that went lost stay around but somehow are neutered (invalid handle) and always return errors when OpenSSL tries to use them, you can't just drop objects because they are pointed at by EVP_PEKY->keydata so they need to stay around until openssl frees the data.
I forget if we already properly handle this in the fork case, if we do, then maybe it is ok.

bukka · 2025-03-10T18:25:52Z

Yeah the fork logic just marks the object as invalid but does not free it. Specifically this happens in p11prov_slot_fork_reset that calls p11prov_session_pool_fork_reset to reset all session (setting invalid handle to them) and then p11prov_obj_pool_fork_reset to do this for objects in pool. It also marks objects as raf so they get refreshed automatically. From what I see the object should stay in the pool for the whole time so it might cover them all but will need to test it. I noticed a small issue with re-importing the imported keys which I have fix in bukka@f9066d2 but need to see if it applies in this case as well. Potentially might need this one as well.

In terms of the actual change I was thinking to do something like this bukka@34d43a8 (the diff looks a bit bigger but effectively it's just renaming fork_child and exposing it and keeping fork_child just to make it more consistent with other fork functions and possibly allow some args to differentiate fork and error cases - there should be some check not to do continously if those devices errors are repeated in succession for example) so it's not tested. I was wondering if checking for that error code in interface is the right place. The thing is that this error might happen just once and then it show the invalid handle errors so I need to catch it on any call so it seems like the right place to me but if you think it should be elsewhere, just let me know and I will change it.

I wonder if it wouldn't be better to just put pkcs11-provider in an error state and have the application restart ...

This is actually quite difficult because the application is nginx so it would have to use some external application that would monitor errors and restarted nginx...

I initially thought about handling it in pkcs11-proxy but it's just a dummy forwarder so I would have to add there all caching logic (remembering all objects and sessions) so they can be restarted and then some sort of re-mapping. But provider have all this logic so I think it seems like the ideal place to do the recovery in my use case.

bukka mentioned this pull request Dec 27, 2024

Invalid object handles refreshing #494

Open

bukka added 3 commits December 27, 2024 17:13

Clear session opening errors on successful re-opening

0cc3b42

Signed-off-by: Jakub Zelenka <[email protected]>

Introduce p11prov_obj_refresh_invalid

d9dc086

This is useful for refreshing invalid object handles Signed-off-by: Jakub Zelenka <[email protected]>

Refresh signature key object on invalid handle

c925ead

Signed-off-by: Jakub Zelenka <[email protected]>

bukka force-pushed the refresh-on-invalid branch from 6b2529b to c925ead Compare December 27, 2024 16:13

simo5 requested changes Jan 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix handling of invalid object handles #495

Fix handling of invalid object handles #495

Uh oh!

bukka commented Dec 27, 2024 •

edited by simo5

Loading

Uh oh!

simo5 left a comment

Uh oh!

simo5 Jan 6, 2025

Uh oh!

simo5 Jan 6, 2025

Uh oh!

simo5 commented Jan 6, 2025

Uh oh!

bukka commented Jan 6, 2025

Uh oh!

simo5 commented Jan 6, 2025

Uh oh!

simo5 commented Jan 6, 2025

Uh oh!

bukka commented Jan 6, 2025

Uh oh!

bukka commented Mar 10, 2025

Uh oh!

simo5 commented Mar 10, 2025

Uh oh!

bukka commented Mar 10, 2025

Uh oh!

Uh oh!

Fix handling of invalid object handles #495

Are you sure you want to change the base?

Fix handling of invalid object handles #495

Uh oh!

Conversation

bukka commented Dec 27, 2024 • edited by simo5 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Reviewer's checklist:

Uh oh!

simo5 left a comment

Choose a reason for hiding this comment

Uh oh!

simo5 Jan 6, 2025

Choose a reason for hiding this comment

Uh oh!

simo5 Jan 6, 2025

Choose a reason for hiding this comment

Uh oh!

simo5 commented Jan 6, 2025

Uh oh!

bukka commented Jan 6, 2025

Uh oh!

simo5 commented Jan 6, 2025

Uh oh!

simo5 commented Jan 6, 2025

Uh oh!

bukka commented Jan 6, 2025

Uh oh!

bukka commented Mar 10, 2025

Uh oh!

simo5 commented Mar 10, 2025

Uh oh!

bukka commented Mar 10, 2025

Uh oh!

Uh oh!

bukka commented Dec 27, 2024 •

edited by simo5

Loading