Skip to content

[Bounty $1500] Support RM natively within kernel for tensor-tensor binary #28645

@marty1885

Description

@marty1885

📝 Background

Currently the row major version of eltwise is performed by converting to tiled, performing the eltwise and then converting back to row major. This approach introduces costly conversions and dependencies on other operations that could be avoided.

🎯 What Success Looks Like

  • Native row major tensor-tensor binary operations without tiled conversion
  • Elimination of costly RM→tiled→RM conversion overhead
  • Direct eltwise operations on original row major tensors
  • Reduced dependencies on other operations
  • Better performance, in terms of overall execution time, compared to the current approach

💡 Problem to Solve

The current implementation requires converting row major tensors to tiled format, performing the eltwise operation, and then converting back to row major. This creates unnecessary computational overhead and introduces dependencies that impact performance. The challenge is to implement native row major support within the kernel to perform eltwise operations directly on row major tensors.

🧭 Guidance & Starting Points

  • Examine the current eltwise implementation that uses tiled conversion
  • Identify the kernel components that handle tensor-tensor binary operations
  • Review the row major tensor data layout and access patterns
  • Understand the conversion overhead between RM and tiled formats

🔎 Possible Approaches

  • Implement direct row major tensor access in the kernel
  • Create optimized memory access patterns for RM tensors
  • Develop kernel variants that operate natively on row major data
  • Optimize for memory bandwidth and cache efficiency with RM layout

📚 Resources

Metadata

Metadata

Assignees

Type

No type

Projects

Status

PR Submitted 🕒

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions