Skip to content

[Feature]: SMC message submission ioctl #187

@alewycky-tenstorrent

Description

@alewycky-tenstorrent

Is your feature request related to a problem? Please describe.

The problem is that SMC ("ARC") message submissions can conflict when applications have direct access to the queues. We have multiple queues, but it's never enough to give each possible concurrent application its own queue and we don't have a synchronization mechanism. (Even if we did, queues could be mismanaged or left in an improper state.)

Describe the Solution You'd Like

Apps will post messages through the KMD which becomes the owner of the queue. The interface will be driven through ioctls. KMD will take the request and provide a response. User-mode polling is acceptable, at least initially.

Describe Alternatives You've Considered

App-to-app locking, either using the KMD locks or any other OS-provided locks. Always has scope problems, how do you make sure that everybody agrees on the mapping of lock to device. What if some processes are inside a container and some are outside?

Queue state can be corrected simply by reinitializing it before each submission. (FW doesn't cache the read/write pointers.)

Why is this Feature Important?

We need multi-process (multi-thread) safety.

Proposed Design/Technical Details (Optional)

https://tenstorrent.atlassian.net/wiki/spaces/syseng/pages/537723135/ARC+FW+Messages

Use Cases

All software that uses SMC messages should switch to kernel-managed messages.

Additional Context

Performance concerns. I believe that messages are not performance-critical today, although they are used for PCIe DMA. The proposed implementation only allows one message per device fd and does nothing to pipeline messages in hardware. None of this is baked into the architecture.

Polling vs sleep. Non-performance users may prefer to sleep rather than poll for results. Sleeping is not part of the design, but it can be added on BH which supports MSI.

Shared queue for KMD and app messages. The current design uses the KMD queue for all messages. This means that the KMD has to wait for the current application message to complete before it can submit its own message. But that's already true.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions