[Feature Request] Add full static-shape Telum EP integration for ONNX Runtime on s390x

### Describe the feature request

## Summary
Add full static-shape Telum EP integration for ONNX Runtime on s390x (z16+), including robust capability gating, explicit CPU fallback behavior, and reproducible offload/performance validation.

## Problem
Current Telum EP support is functional but not yet complete across real transformer-like graphs and validation workflows.
Key gaps are:
- partial NNPA function availability across hardware generations
- clear fallback observability when nodes are not offloaded
- repeatable proof (offload + acceleration) suitable for upstream review without native Telum CI hardware

## Proposed Feature
1. Keep static-shape-first scope for Telum EP.
2. Use runtime NNPA capability query as source of truth for kernel registration/partitioning.
3. Default to CPU fallback (non-strict), with strict mode available for fail-fast.
4. Add reproducible benchmark and validation workflow showing:
   - node placement on Telum
   - CPU vs Telum performance deltas on deterministic static models

## Success Criteria
- Telum-assigned nodes are clearly reported.
- Unsupported NNPA paths fall back predictably (or fail in strict mode).
- Reproducible benchmark evidence demonstrates acceleration on Telum-capable hosts.
- Non-Telum builds remain unaffected.

## Non-Goals
- Dynamic-shape full offload in this phase.
- Forcing new external CI infrastructure in upstream repo.

## Request For Maintainer Feedback
- Acceptable CI/testing contract when Telum hardware is unavailable in upstream CI.

   Because upstream CI has no Telum hardware and does not allow third-party runners, we should test Telum capability logic through a stubbed capability interface. The EP keeps the real zDNN-backed capability path in production, but tests can inject deterministic “snapshot” capability profiles (NNPA present/absent, function bits, max-dim limits) to validate GetCapability decisions, partitioning behavior, and CPU fallback paths. This gives reliable regression coverage in standard CI without requiring real Telum execution, while preserving true hardware validation on dedicated s390x/Telum hosts. Runtime behavior on hardware remains unchanged by default.

- Preferred PR slicing for review.

   Suggested: 
          1: runtime/provider behavior
          2: kernel/coverage expansion 
          3: performance/tooling 
          4: Docs


## Initial testing results:
[Telum_EP_Perf_Validation.md](https://github.com/user-attachments/files/25245467/Telum_EP_Perf_Validation.md)

### Note: I see another change pending around s390x. I am not involved with that work, but I would be happy to review and collaborate on any merge conflicts.

### Describe scenario use case

## Use Case Scenario
A production inference service runs transformer-style models on IBM z16/z17 systems.
Operators need:
- deterministic static-shape inference,
- maximal Telum offload where available,
- safe CPU fallback where unavailable,
- visibility into fallback decisions,
- measurable latency/throughput improvements vs CPU-only execution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add full static-shape Telum EP integration for ONNX Runtime on s390x #27315

Describe the feature request

Summary

Problem

Proposed Feature

Success Criteria

Non-Goals

Request For Maintainer Feedback

Initial testing results:

Note: I see another change pending around s390x. I am not involved with that work, but I would be happy to review and collaborate on any merge conflicts.

Describe scenario use case

Use Case Scenario

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Add full static-shape Telum EP integration for ONNX Runtime on s390x #27315

Description

Describe the feature request

Summary

Problem

Proposed Feature

Success Criteria

Non-Goals

Request For Maintainer Feedback

Initial testing results:

Note: I see another change pending around s390x. I am not involved with that work, but I would be happy to review and collaborate on any merge conflicts.

Describe scenario use case

Use Case Scenario

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions