Skip to content

Conversation

@Kixunil
Copy link
Collaborator

@Kixunil Kixunil commented Jan 5, 2022

Panicking across FFI was UB in older Rust versions and thus because of
MSRV it's safer to avoid it. This replaces the panic with print+abort on
std and double panic on no-std.

Closes #354

apoelstra
apoelstra previously approved these changes Jan 5, 2022
Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 5deccc1

Pretty cool that this works with the existing abort/panic tests.

@apoelstra
Copy link
Member

Oh :) it doesn't, it's just that my local test script doesn't exercise the abort tests. Oops.

@Kixunil
Copy link
Collaborator Author

Kixunil commented Jan 5, 2022

Yeah, hopefully this fixes it (letting CI test that).

real-or-random
real-or-random previously approved these changes Jan 5, 2022
Copy link
Collaborator

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK ffac2d6

Do we have a no_std test in CI? I assumed we have but apparently no?

}

let _bomb = PanicOnDrop(&msg);
panic!("[libsecp256k1] {}", &msg)
Copy link

@bjorn3 bjorn3 Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still UB. The unwinder works in two passes. In the first pass it looks for a catch landing pad. In the second pass it actually unwinds. The double panic only happens on the second pass, but on the first pass I think optimizations due to the nounwind attribute may already cause UB. One optimization I can imagine is that LLVM sees that for example rustsecp256k1_v0_4_1_default_illegal_callback_fn can't unwind and then propagates this info to ffi_abort which then results in the landingpad for PanicOnDrop being removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, great that you pointed it out!

I will wrap it in catch_unwind then and put loop {} at the end just in case.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should indeed fix the UB I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I just found that catch_unwind is unavailable without std, looks like the only way is to loop forever. :(

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, forgot about that. Does this library ever get used in places where there is no libc or would it be possible to call abort() from libc? I believe that is what std::process::abort() does on most platforms.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe write a C function that uses a compiler intrinsic to abort and then call this function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, using C function is probably the best way, especially since this crate already contains lot of C.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjorn3 Yes, this library is widely used in wasm32-unknown-unknown
We thought about writing a small c function that calls __builtin_trap but that precludes MSVC: #288 (comment)

This brings me back to this: #354 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(side note, Thank you for taking the time to look at this and respond here! @bjorn3 )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should let it go.

@apoelstra
Copy link
Member

IIRC we cannot use the C abort because it is not available in wasm.

@Kixunil
Copy link
Collaborator Author

Kixunil commented Jan 7, 2022

Rebased and implemented #354 (comment)

@Kixunil
Copy link
Collaborator Author

Kixunil commented Jan 7, 2022

Oh, forgot to suggest the static trick (see test) in the doc but have to run right now, will fix it soon.

@Kixunil Kixunil marked this pull request as draft January 7, 2022 11:58
@Kixunil Kixunil marked this pull request as ready for review January 7, 2022 15:38
@Kixunil
Copy link
Collaborator Author

Kixunil commented Jan 7, 2022

Done.

apoelstra
apoelstra previously approved these changes Jan 7, 2022
Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 96ddf07

I would like another ACK, maybe from @elichai, before merging.

Thank you for the thorough comment. I think the one about "you should check the docs about abort if you are using no_std" might be unnecessary, since in the expected case it should be impossible to ever call this abort handler.

@Kixunil
Copy link
Collaborator Author

Kixunil commented Jan 7, 2022

I was hoping "libsecp256k1 may want to abort in case of invalid inputs. These are definitely bugs." was clear that it shouldn't happen in practice but if you think it's not clear enough I can try reword it.

I don't like leaving this, even unlikely, case without calling it out, especially because e.g. embedded platforms usually have some kind of trap/reset instruction and may even have an interface for debug printing the message (seen that on STM32).
IMO it's pretty low effort for potentially avoiding annoying situation.

Panicking across FFI was UB in older Rust versions and thus because of
MSRV it's safer to avoid it. This replaces the panic with print+abort on
`std` and double panic on no-std.

Closes rust-bitcoin#354
@Kixunil
Copy link
Collaborator Author

Kixunil commented Jan 7, 2022

Fixed that comment.

apoelstra
apoelstra previously approved these changes Jan 7, 2022
Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 4aabbbd

I think the comments are fine, thanks for explaining.

I'm unsure that these pointer-casting methods are actually needed, but I can't convince myself that they're not, so I'll let them be. (I think as *const _ as *mut _ should have the same guarantees. But I'm not sure.)

@Kixunil
Copy link
Collaborator Author

Kixunil commented Jan 7, 2022

I opened a discussion about this and even Ralf Jung agrees which to me is a very strong indication it's a good idea. :) The other conversion wasn't strictly required but I like keeping it as documentation.

@apoelstra
Copy link
Member

Nice :)

I continue to want one more concept ACK on this before merging.


/// Ensures that types both sides of cast stay in sync and only the constness changes.
///
/// This elliminates the risk that if we change the type signature of abort handler the cast
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/elliminates/eliminates

Copy link
Collaborator Author

@Kixunil Kixunil Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in separate commit to make review easier.

Did you also check soundness? We need more soundness reviewers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly, I haven't the knowledge for a soundness review on this

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 540c783

@apoelstra
Copy link
Member

cc @elichai think we can merge this?

@elichai
Copy link
Member

elichai commented Jan 24, 2022

I'd prefer a bit more time to think of alternatives if this isn't urgent.
For the future, I'm trying to get core::intrinsics::abort stabilized

The reasons I'm not excited about this idea:

  1. Setting an abort handler is unergonomic and feels like no user will ever use that (they probably won't even know it exists).
  2. looping forever hangs the CPU and doesn't provide any feedback to the user.
  3. loop{} honestly scares me a little bit (see LLVM loop optimization can make safe programs crash rust-lang/rust#28728)

Some half-baked alternatives suggestions:

  1. Can we maybe get upstream to change their definitions such that the calling function is promised to return if the callback returns? then either they promise to return 0/-1 in that case, or we can set a global error atomic here (although that's gonna infect the whole codebase unless we use a thread local which we don't have without stdlib)
  2. Maybe a stack-overflow is better than an infinite loop? (Is that a security risk? also scary)
  3. inline assembly should be stable in rust 1.59, so we can maintain a list of abort instructions(we can copy off of glibc and it should be a one time thing), but I can see why some people won't like it, and also it's a problem with our MSRV.

(FYI, it looks like abort is definitely a hard problem: https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/abort.c)

@bjorn3
Copy link

bjorn3 commented Jan 25, 2022

Maybe a stack-overflow is better than an infinite loop? (Is that a security risk? also scary)

Not all OSes use a stack guard page. On those that don't a stack overflow may smash the heap.

@Kixunil
Copy link
Collaborator Author

Kixunil commented Jan 25, 2022

I'm not excited about that either. The reason I proposed it is it seemed the least bad option. At the time I thought loop {} was long fixed but now that I look at it again, it was only fixed somewhat recently.

they probably won't even know it exists

There's a bold warning about it in the docs. If someone doesn't read the docs at all, especially for security-critical software, they will have to suffer consequences. There's only so much we can do against stupidity.

looping forever hangs the CPU and doesn't provide any feedback to the user

Non-issue for std builds, no-std presumably don't have an OS, which means they're running bare metal - this problem is literally unsolvable in a library.

loop{} honestly scares me

Started to scare me now that I learned it wasn't long ago that it was fixed.

Can we maybe get upstream to change their definitions such that the calling function is promised to return if the callback returns?

It already does, the return values are just unspecified. We could set an atomic variable and check it after each call or use &mut context and have it on the stack but that's annoying. Or use setjmp/longjmp. I agree maybe we should make it just return a specified error value.

Maybe a stack-overflow is better than an infinite loop?

I believe the opposite.

inline assembly should be stable in rust 1.59

Yes, MSRV may be less of a problem for embedded folks who have to use a recent Rust version anyway.

Now I can see only these solutions:

  • Make the function a symbol the user has to provide, failing compilation on no_std if it's missing.
  • Bump MSRV for no_std to high-enough value where Rust can support this
  • Convince upstream to return defined error values
  • Do some other ugly hack

Out of these, failing compilation on no_std looks best to me at least for now. In 5 years we will bump MSRV and make it even better.
Note that I also think keeping the ability to set the handler like I implemented is a useful feature so I prefer to not discard the code entirely.

@apoelstra
Copy link
Member

I still think @Kixunil's solution is the least bad option.

The idea of smashing the stack is tempting ... the Rust developers do take pains to make this safe even on architectures that don't help but I agree that this feels like a bad and dangerous idea. I also agree that using loop {} is quite disconcerting. We could replace that with a couple nested loops which update an atomic value 2^96 times, say, followed by a panic! that we know will never actually be executed.

I also don't want to fail compilation. These functions are really supposed to be impossible to hit, I don't want to inconvenience users for their sake.

@Kixunil
Copy link
Collaborator Author

Kixunil commented Jan 25, 2022

Actually, we can just perform relaxed load from an atomic with panic if it's some value which we will make sure never happens but the compiler should be unable to prove it. Actually, volatile read is even better than atomic.

@apoelstra
Copy link
Member

I would kinda prefer to just resurrect #288 or similar.

This PR is pretty complicated, introducing a global abort handler which can be changed at runtime, for the sake of handling panics which are always programming errors and should never be hit. As described in #288, if our abort handler doesn't actually abort, upstream always turns the error into a runtime 0 return. (But ok, I understand not wanting to rely on that.)

I would propose that we remove the ability to replace the handler, even if it makes testing a tiny bit easier.

Then, as far as what do to in our fixed handler:

  • On std call std::process::abort which seems easy enough. We can also print, sleep, whatever.
  • On no-std we have a choice of returning, loop {} (which I don't really trust due to this now-fixed-but-open-forever LLVM issue), spin-looping (when this PR was opened our MSRV was 1.29; now it is 1.63 and we can use core::hint::spin_loop which might be a little nicer to the processor) with some sort of atomic increment or other side-effect, or double-panicking (as this PR suggests it does, but doesn't do).
  • On 1.81+ we can panic! though that would require some sort of compiler-version-detection logic which is either an extra dep or a DRY violation.

We could also keep ignoring this issue, on the theory that in 1.81+ the behavior is defined and prior to that the behavior seems to be reliable (i.e. the compiler is not deleting half our code or doing other nasal-demon things because of it).

We could also attempt to move this function into C, but remember that we don't have a standard library (the existing code originates with upstream aborting by default, and we asked them to remove that code so we could put Rust in its place, so that our stuff would work in wasm).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Undefined behavior: the library panics across FFI boundary

6 participants