[WebGPU] Enable Cast to int64 by default by fanchenkong1 · Pull Request #28804 · microsoft/onnxruntime

fanchenkong1 · 2026-06-05T05:23:46Z

Description

Support casting to int64 from float32 via IEEE-754 bit decomposition.

Introduce a new float_to_int64 helper that emits the truncated-toward-zero value in full int64 range.
to_ type now always allows int64, regardless of enable_int64; casting from int64 stays gated by enable_int64.
Adds cast_op_test.cc coverage for the newly introduced conversions.

Motivation and Context

While running the mask-generation vision encoder (Xenova/sam-vit-base) on the WebGPU EP via Transformers.js, float32-to-int64 cast nodes fall back to the CPU provider under the default session configuration, because casting to int64 was previously gated behind enable_int64 flag, introducing host memcpy and synchronization overhead.

Making cast-to-int64 correct across the full int64 range lets it run on the
WebGPU EP by default, keeping these nodes on-device and eliminating the stalls.

Performance Impact

Measured on the vision_encoder.onnx of Xenova/sam-vit-base (mask-generation, SAM ViT-base vision encoder) on the
WebGPU EP.

Platform	Latency reduction	Speedup
Intel Wildcat Lake	−22.8%	1.30×
Intel Panther Lake	−17.1%	1.21×

This change yields a 1.2–1.3× speedup on the SAM ViT-base vision encoder under default configuration.

Support casting to int64 from float32/float16 via IEEE-754 bit decomposition. T2 now always allows int64; casting from int64 stays gated by enable_int64. Adds cast_op_test.cc coverage for the newly introduced conversions.

daijh · 2026-06-05T05:57:07Z

@qjia7 @guschmue PTAL.

qjia7 · 2026-06-05T08:19:42Z

+        }
+      }
+    }
+    sh.MainFunctionBody() << "  y[base] = " << values[0] << ";\n";


nit: Please use output.SetByOffset instead of indirect accessing y.

Copilot

Pull request overview

This PR updates the WebGPU EP’s Cast kernel to allow casting to int64 by default (even when enable_int64 is false), and implements float32/float16 → int64 conversion in WGSL via IEEE-754 bit decomposition to avoid CPU fallback and associated device/host sync overhead.

Changes:

Add an IEEE-754 bit-decomposition path for float → int64 in the WebGPU Cast shader, including lane-safe stores for int64 outputs.
Adjust WebGPU Cast kernel type constraints so T2 (output) always allows int64, while T1 (input) still gates int64 on enable_int64.
Add CPU-side Cast tests covering large float→int64 values and several int32/uint32/bool→int64 and size%4 regression cases.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`onnxruntime/core/providers/webgpu/tensor/cast.h`	Extends `CastProgram` parameters and adds `output_size` uniform for int64 tail handling.
`onnxruntime/core/providers/webgpu/tensor/cast.cc`	Implements float→int64 WGSL helper and relaxes output type constraints for int64; updates shader codegen paths.
`onnxruntime/test/providers/cpu/tensor/cast_op_test.cc`	Adds test coverage for float32/float16/int32/uint32/bool to int64 conversions and non-multiple-of-4 sizes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

    sh.MainFunctionBody() << "  let a0 = " << input.GetByOffset("global_idx * 4") << ";\n"
                          << "  let a1 = " << input.GetByOffset("global_idx * 4 + 1") << ";\n"
                          << "  let a2 = " << input.GetByOffset("global_idx * 4 + 2") << ";\n"
                          << "  let a3 = " << input.GetByOffset("global_idx * 4 + 3") << ";\n"
                          << "  let a = vec4<i32>(a0, a1, a2, a3);\n";


+    sh.MainFunctionBody() << "  y[base] = " << values[0] << ";\n";
+    for (size_t i = 1; i < 4; ++i) {
+      sh.MainFunctionBody() << "  if (base + " << i << "u < uniforms.output_size) { y[base + " << i
+                            << "u] = " << values[i] << "; }\n";


fanchenkong1 changed the title ~~[WebGPU EP] Enable Cast to int64 by default~~ [WebGPU] Enable Cast to int64 by default Jun 5, 2026

[WebGPU EP] Enable Cast to int64

53a160d

Support casting to int64 from float32/float16 via IEEE-754 bit decomposition. T2 now always allows int64; casting from int64 stays gated by enable_int64. Adds cast_op_test.cc coverage for the newly introduced conversions.

fanchenkong1 force-pushed the enable-webgpu-float2int64 branch from 30ad07e to 53a160d Compare June 5, 2026 05:54

fanchenkong1 marked this pull request as ready for review June 5, 2026 05:55

qjia7 reviewed Jun 5, 2026

View reviewed changes

qjia7 requested a review from Copilot June 5, 2026 08:22

Copilot started reviewing on behalf of qjia7 June 5, 2026 08:23 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WebGPU] Enable Cast to int64 by default#28804

[WebGPU] Enable Cast to int64 by default#28804
fanchenkong1 wants to merge 1 commit into
microsoft:mainfrom
fanchenkong1:enable-webgpu-float2int64

fanchenkong1 commented Jun 5, 2026 •

edited

Loading

Uh oh!

daijh commented Jun 5, 2026

Uh oh!

qjia7 Jun 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fanchenkong1 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Performance Impact

Uh oh!

daijh commented Jun 5, 2026

Uh oh!

qjia7 Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fanchenkong1 commented Jun 5, 2026 •

edited

Loading