-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Kernel: Trap/signal fixes #26491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Kernel: Trap/signal fixes #26491
Conversation
|
It's still panicking on aarch64. Or maybe something went wrong on my end with applying the patch? |
|
So I also got this to panic on aarch64 some time ago, but only once (somehow this seems to be really uncommon). The stack trace went directly from |
|
@spholz Given all the issues and additional complexity with the |
I tried again with default QEMU settings (I used the e1000 network card previously) but it still panics at a similar rate. |
|
That's a longer stack trace than what I was seeing, but its got the same issue (IRQ while handling page fault). We could maybe just not enable interrupts during exception handling (though I think most OSes do enable those). |
|
(disclaimer: I haven't looked at your code yet)
I still think the more correct solution is to do all of this in
Interrupts should be enabled whenever possible (so including during exceptions) to not cause a high interrupt latency.
Not sure. But I think this still should happen with the old approach, it just happens to not work with this new approach somehow. In fact, I've had a similar panic backtrace on RISC-V before in #23387 (see the attached gdb backtrace screenshot). Do you know what actually causes the assertion failure? Maybe we should try to investigate that first.
If the problem is really that we shouldn't page fault, we can ensure that the stack region is mapped before accessing it by calling the page fault handler. But I'd really like to avoid that if possible. |
Oops, no that's wrong. I've misread that assertion, sorry. What I was referring to is the non-negated version of that assertion. The scheduler lock is just held by the signal dispatching code and the code at the top of the stack doesn't expect it to be held. |
It kind of is, but we can't hold on to
Yeah, ideally we wouldn't unconditionally disable/never enable those if we can solve this without doing that.
Apparently aarch64 just unconditionally enables interrupts in |
If there's a higher level trap, then that should take care of signal dispatching, scheduling, etc. when it exits its own trap frame.
This isn't necessary and only leads to deadlocks with SMP.
472b64a to
7f1796d
Compare
|
That should take care of #26493 and the panic from #26489 on aarch64 (though let me know if either of those still happen). While testing this, I noticed even with the latest changes from this PR, I haven't profiled this, but I guess this approach might be adding latency to signal dispatching since |
|
I did some pretty thorough testing and confirm that there are no more panics. |
|
|
||
| Processor::disable_interrupts(); | ||
| Processor::current().exit_trap(*trap_frame); | ||
| Processor::current().exit_trap(*trap_frame, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #26493
Fixes #26488
Fixes (part of) #26489 (the kernel no longer panics, but you can still get dash to crash)
With this, we no longer perform any non-essential steps in
ProcessorBase::exit_trapif we're exiting an exception handler and there's a higher level trap we can defer those steps to. Not sure if this is the best approach really (not that I thought of a better one either), but this does get rid of all the kernel panics seen in #26489.The thread's lock is also no longer taken while (potentially) dispatching signals. AFAICT taking the scheduler lock (which is something that
Thread::check_dispatch_pending_signalalready does) should provide sufficient locking.