WebAssembly / Web runtime (both for wasm-simd and WebGPU) #8216
Replies: 9 comments
-
cc: @mcr229 or @digantdesai regarding running XNNPACK via wasm |
Beta Was this translation helpful? Give feedback.
-
Also cc: @mergennachin |
Beta Was this translation helpful? Give feedback.
-
I've talked with @digantdesai about this before. I think for xnnpack he mentioned it should just be plug and play. Ive been wanting to try out wasm for sometime now just havent had the bandwidth. |
Beta Was this translation helpful? Give feedback.
-
I also wonder about the fusion capabilities of executorch :) Does it allow Inductor codegen'd fused kernels (e.g. think quant/dequant fused into the flash attn kernel directly, with positional embedding computation also fused into this kernel)? Another interesting backend is webgpu/wgpu: https://github.com/huggingface/ratchet or even directly wgpu/wgsl shaders could in theory be a compilation target for fused kernels But even if executorch does not support wild codegen/fusions - it's still be good to have it as a baseline with comparisons against ort-web and tflate-tfjs and tvm-wasm and ggml compiled to wasm. This should show roughly where all these frameworks stand (especially if compiling is relatively doable) |
Beta Was this translation helpful? Give feedback.
-
And given that currently PyTorch does not have its own inference wasm/WebGPU story, having executorch compiled to wasm-simd might be a nice baseline to have (especially if it's minimalistic and relatively simple to compile) |
Beta Was this translation helpful? Give feedback.
-
I suspect much of the core should compilable with emscripten cpp compiler. Probably not optimized operators though and not too sure about backends/xnnpack |
Beta Was this translation helpful? Give feedback.
-
Maybe best would be adding some sort of GitHub Actions CI test compiling it with emscripten... (even if no tests using it exist so far) |
Beta Was this translation helpful? Give feedback.
-
It should be, given a bunch of WASM[SIMD] kernels. I haven't tried it myself though. IIRC there aren't any CI for that on github/xnnpack either. |
Beta Was this translation helpful? Give feedback.
-
xnnpack is also known to compile (and maybe even tested) for wasm/simd, so somehow this should be achievable... don't know if any compact backend library/project exists for webgpu kernels |
Beta Was this translation helpful? Give feedback.
-
I'm wondering if ExecuTorch can be compiled for WebAssembly target? As far as I understand, XNNPACK exists for wasm-simd, so theoretically at least for CPU it can be done? (e.g. to be compared with tflite+tfjs, ort-web and tvm-wasm at least for some popular models like MobileNets)
(This is especially interesting if strong fusion/codegen can be done to produce fused wasm-simd code/fused WebGPU programs - although maybe this is an ask for Inductor)
Beta Was this translation helpful? Give feedback.
All reactions