-
Notifications
You must be signed in to change notification settings - Fork 305
Don't panic across FFI #358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 5deccc1
Pretty cool that this works with the existing abort/panic tests.
Oh :) it doesn't, it's just that my local test script doesn't exercise the abort tests. Oops. |
5deccc1
to
6ff72f8
Compare
Yeah, hopefully this fixes it (letting CI test that). |
6ff72f8
to
ffac2d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK ffac2d6
Do we have a no_std test in CI? I assumed we have but apparently no?
secp256k1-sys/src/lib.rs
Outdated
} | ||
|
||
let _bomb = PanicOnDrop(&msg); | ||
panic!("[libsecp256k1] {}", &msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still UB. The unwinder works in two passes. In the first pass it looks for a catch landing pad. In the second pass it actually unwinds. The double panic only happens on the second pass, but on the first pass I think optimizations due to the nounwind attribute may already cause UB. One optimization I can imagine is that LLVM sees that for example rustsecp256k1_v0_4_1_default_illegal_callback_fn
can't unwind and then propagates this info to ffi_abort
which then results in the landingpad for PanicOnDrop
being removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, great that you pointed it out!
I will wrap it in catch_unwind
then and put loop {}
at the end just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should indeed fix the UB I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I just found that catch_unwind
is unavailable without std
, looks like the only way is to loop forever. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, forgot about that. Does this library ever get used in places where there is no libc or would it be possible to call abort()
from libc? I believe that is what std::process::abort()
does on most platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe write a C function that uses a compiler intrinsic to abort and then call this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, using C function is probably the best way, especially since this crate already contains lot of C.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bjorn3 Yes, this library is widely used in wasm32-unknown-unknown
We thought about writing a small c function that calls __builtin_trap
but that precludes MSVC: #288 (comment)
This brings me back to this: #354 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(side note, Thank you for taking the time to look at this and respond here! @bjorn3 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should let it go.
IIRC we cannot use the C |
ffac2d6
to
40919c2
Compare
Rebased and implemented #354 (comment) |
Oh, forgot to suggest the static trick (see test) in the doc but have to run right now, will fix it soon. |
40919c2
to
0f351a7
Compare
Done. |
0f351a7
to
96ddf07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 96ddf07
I would like another ACK, maybe from @elichai, before merging.
Thank you for the thorough comment. I think the one about "you should check the docs about abort if you are using no_std" might be unnecessary, since in the expected case it should be impossible to ever call this abort handler.
I was hoping "libsecp256k1 may want to abort in case of invalid inputs. These are definitely bugs." was clear that it shouldn't happen in practice but if you think it's not clear enough I can try reword it. I don't like leaving this, even unlikely, case without calling it out, especially because e.g. embedded platforms usually have some kind of trap/reset instruction and may even have an interface for debug printing the message (seen that on STM32). |
Panicking across FFI was UB in older Rust versions and thus because of MSRV it's safer to avoid it. This replaces the panic with print+abort on `std` and double panic on no-std. Closes rust-bitcoin#354
96ddf07
to
4aabbbd
Compare
Fixed that comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 4aabbbd
I think the comments are fine, thanks for explaining.
I'm unsure that these pointer-casting methods are actually needed, but I can't convince myself that they're not, so I'll let them be. (I think as *const _ as *mut _
should have the same guarantees. But I'm not sure.)
I opened a discussion about this and even Ralf Jung agrees which to me is a very strong indication it's a good idea. :) The other conversion wasn't strictly required but I like keeping it as documentation. |
Nice :) I continue to want one more concept ACK on this before merging. |
secp256k1-sys/src/lib.rs
Outdated
|
||
/// Ensures that types both sides of cast stay in sync and only the constness changes. | ||
/// | ||
/// This elliminates the risk that if we change the type signature of abort handler the cast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/elliminates/eliminates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in separate commit to make review easier.
Did you also check soundness? We need more soundness reviewers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly, I haven't the knowledge for a soundness review on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 540c783
cc @elichai think we can merge this? |
I'd prefer a bit more time to think of alternatives if this isn't urgent. The reasons I'm not excited about this idea:
Some half-baked alternatives suggestions:
(FYI, it looks like abort is definitely a hard problem: https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/abort.c) |
Not all OSes use a stack guard page. On those that don't a stack overflow may smash the heap. |
I'm not excited about that either. The reason I proposed it is it seemed the least bad option. At the time I thought
There's a bold warning about it in the docs. If someone doesn't read the docs at all, especially for security-critical software, they will have to suffer consequences. There's only so much we can do against stupidity.
Non-issue for
Started to scare me now that I learned it wasn't long ago that it was fixed.
It already does, the return values are just unspecified. We could set an atomic variable and check it after each call or use
I believe the opposite.
Yes, MSRV may be less of a problem for embedded folks who have to use a recent Rust version anyway. Now I can see only these solutions:
Out of these, failing compilation on |
I still think @Kixunil's solution is the least bad option. The idea of smashing the stack is tempting ... the Rust developers do take pains to make this safe even on architectures that don't help but I agree that this feels like a bad and dangerous idea. I also agree that using I also don't want to fail compilation. These functions are really supposed to be impossible to hit, I don't want to inconvenience users for their sake. |
Actually, we can just perform relaxed load from an atomic with panic if it's some value which we will make sure never happens but the compiler should be unable to prove it. Actually, volatile read is even better than atomic. |
I would kinda prefer to just resurrect #288 or similar. This PR is pretty complicated, introducing a global abort handler which can be changed at runtime, for the sake of handling panics which are always programming errors and should never be hit. As described in #288, if our abort handler doesn't actually abort, upstream always turns the error into a runtime 0 return. (But ok, I understand not wanting to rely on that.) I would propose that we remove the ability to replace the handler, even if it makes testing a tiny bit easier. Then, as far as what do to in our fixed handler:
We could also keep ignoring this issue, on the theory that in 1.81+ the behavior is defined and prior to that the behavior seems to be reliable (i.e. the compiler is not deleting half our code or doing other nasal-demon things because of it). We could also attempt to move this function into C, but remember that we don't have a standard library (the existing code originates with upstream |
Panicking across FFI was UB in older Rust versions and thus because of
MSRV it's safer to avoid it. This replaces the panic with print+abort on
std
and double panic on no-std.Closes #354