-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Is your feature request related to a problem? Please describe.
The problem is that SMC ("ARC") message submissions can conflict when applications have direct access to the queues. We have multiple queues, but it's never enough to give each possible concurrent application its own queue and we don't have a synchronization mechanism. (Even if we did, queues could be mismanaged or left in an improper state.)
Describe the Solution You'd Like
Apps will post messages through the KMD which becomes the owner of the queue. The interface will be driven through ioctls. KMD will take the request and provide a response. User-mode polling is acceptable, at least initially.
Describe Alternatives You've Considered
App-to-app locking, either using the KMD locks or any other OS-provided locks. Always has scope problems, how do you make sure that everybody agrees on the mapping of lock to device. What if some processes are inside a container and some are outside?
Queue state can be corrected simply by reinitializing it before each submission. (FW doesn't cache the read/write pointers.)
Why is this Feature Important?
We need multi-process (multi-thread) safety.
Proposed Design/Technical Details (Optional)
https://tenstorrent.atlassian.net/wiki/spaces/syseng/pages/537723135/ARC+FW+Messages
Use Cases
All software that uses SMC messages should switch to kernel-managed messages.
Additional Context
Performance concerns. I believe that messages are not performance-critical today, although they are used for PCIe DMA. The proposed implementation only allows one message per device fd and does nothing to pipeline messages in hardware. None of this is baked into the architecture.
Polling vs sleep. Non-performance users may prefer to sleep rather than poll for results. Sleeping is not part of the design, but it can be added on BH which supports MSI.
Shared queue for KMD and app messages. The current design uses the KMD queue for all messages. This means that the KMD has to wait for the current application message to complete before it can submit its own message. But that's already true.