Read next:
- solver priorities: SOLVER_ROADMAP.md
- product execution: PRODUCT_ROADMAP.md
This note analyzes where WebGPU is a good fit for Dedaliano and where it is not.
The split matters:
rendererwork is primarily a product and visualization problemsolverwork is primarily a numerical-methods and performance problem
Those are not the same ROI.
WebGPU for rendering: good fit, but only if the current viewport becomes a real bottleneckWebGPU for solver acceleration: possible in selective areas, but not the right first move for the current direct sparse CPU solver
Best immediate candidates:
- renderer / visualization
- large postprocessing kernels
- later, iterative-solver kernels if Krylov methods are added
Poor early candidates:
- sparse direct factorization
- ordering / graph algorithms
- small mixed models where GPU overhead dominates
Rendering is naturally GPU-shaped:
- mesh drawing
- contour shading
- deformed shape visualization
- picking and highlighting
- animated modes and time-history playback
- large shell/result-field overlays
WebGPU is a good fit when the product needs:
- smoother interaction on large shell-heavy models
- more demanding stress/contour visualization
- more complex highlighting and selection
- a longer-term modern graphics pipeline than WebGL
-
Large shell meshesStress contours, deformed shapes, mode shapes, thermal fields, and shell-family comparisons. -
Result overlaysMany simultaneous views:- undeformed + deformed
- shell stresses
- constraint-force overlays
- support/diagnostic highlights
-
InteractionPicking, hover, click-to-focus, diagnostic highlighting, section cuts, and result filtering. -
AnimationModal shape playback, harmonic response, time-history sequences.
- Solver runtime
- Sparse factorization
- Ordering/fill
- Nonlinear convergence
So WebGPU rendering should be treated as a visualization/product upgrade, not a solver-performance fix.
Use WebGPU for rendering only if profiling shows that the current viewer is becoming the bottleneck on:
- shell-heavy models
- large result fields
- interactive contour updates
- animated responses
If the current problems are still mostly:
- onboarding
- diagnostics UX
- reports
- interoperability
- solver runtime
then renderer migration should not be the first product priority.
This is the current core solve path:
- sparse Cholesky
- symbolic structure
- ordering
- fill control
That class of work is a poor first WebGPU target because:
- sparse factorization is irregular
- fill and pivot behavior are hard to map well to browser GPU kernels
- ordering/fill logic is not GPU-friendly
- data movement and synchronization can erase gains
This is especially true while Dedaliano is still improving:
- sparse eigensolver depth
- factorization/runtime balance
- ordering policy
AMD, RCM, fill-reduction logic, elimination trees, and related graph operations are not good early GPU wins.
If Dedaliano adds:
PCGGMRESMINRES
then WebGPU becomes much more plausible for:
- sparse mat-vec
- dot products
- vector updates
- residual evaluation
These are much more GPU-shaped than sparse direct factorization.
This means WebGPU for the solver starts making real sense only after:
- Krylov methods exist
- preconditioner strategy exists
- the CPU direct sparse path is already healthy enough to compare against
Large uniform shell meshes could potentially benefit from GPU execution of:
- element stiffness/load evaluation
- shell postprocessing kernels
- stress recovery / nodal accumulation
But this is only worth it after profiling proves that element math dominates.
Right now, recent profiling has shown the real performance bottlenecks shifting through:
- sparse matrix construction
- overbuilding dense/sparse forms
- factorization / ordering behavior
not raw per-element floating-point work in the common cases.
This is the best solver-adjacent GPU target before full iterative methods:
- stress recovery
- nodal averaging
- contour field preparation
- repeated result-field transforms for visualization
These are more regular and easier to parallelize than direct sparse solves.
-
Research onlyKeep a WebGPU track alive for:- renderer architecture
- result-field visualization
- later iterative-solver kernels
-
Potential product candidateWebGPU renderer, but only after profiling proves the viewport is a real bottleneck on large shell/result models.
-
GPU-accelerated postprocessingEspecially shell stresses, contours, nodal averaging, and mode-shape visualization. -
GPU iterative linear algebraOnly after Krylov methods are added and the CPU sparse baseline is already strong.
- replacing sparse Cholesky with WebGPU
- moving ordering/fill logic to GPU
- solving current scale problems by GPU rewrite instead of direct sparse-path hardening
If Dedaliano wants to explore WebGPU responsibly, the order should be:
profile the current rendereruse WebGPU for visualization only if it is actually a bottleneckconsider GPU postprocessing kernelsadd iterative solvers and preconditioners on CPU firstonly then evaluate WebGPU iterative kernels
Not:
- rewrite sparse direct solves for GPU
- rewrite the whole app in Rust
- assume GPU automatically improves structural FEM
WebGPU is a strong fit for:
- rendering
- visualization
- result-field processing
WebGPU is not the best first answer for:
- sparse direct structural solves
- ordering/fill problems
- the current linear-solver bottlenecks
So the practical answer is:
renderer: yes, when/if viewport scale makes it worthwhilesolver: yes, selectively later, mainly if Dedaliano adds iterative methods
There is substantial research literature around GPU-accelerated FEM and structural solvers, but the highest-value directions are not "move the current sparse direct solver to GPU unchanged."
The most relevant long-term tracks for Dedaliano are:
Best-fit research direction for structural FEM on GPU:
CG/PCGGMRESMINRES- block Krylov variants
Why relevant:
- sparse matrix-vector products
- dot products
- vector updates map naturally to GPU kernels.
Supporting pieces for iterative methods:
- sparse mat-vec
- sparse triangular solves
- preconditioner application
These are far more realistic than GPU sparse direct factorization as an early solver-GPU target.
Batched element stiffness/load evaluation can make sense for:
- large uniform shell meshes
- repeated element-level postprocessing
- heavier shell families where per-element work is significant
This is plausible later, but only after profiling proves element math dominates rather than sparse data-structure overhead or factorization.
Mostly iterative eigensolver research:
LanczosArnoldiLOBPCG
This becomes relevant if Dedaliano continues pushing sparse eigensolver depth.
A stronger long-term research direction than GPU sparse direct factorization:
- do not explicitly assemble the full global matrix
- apply the operator directly
- combine with iterative solvers / multigrid
This is the most plausible "deep GPU solver" direction if Dedaliano ever wants a major GPU solver program.
Research exists, but this is not the first practical target:
- sparse
Cholesky - sparse
LDL^T - sparse
LU
Why lower-priority:
- fill-in
- ordering
- pivoting
- irregular memory access make this much harder than iterative approaches.
Important for very large FEM systems, but higher complexity than the near-term Dedaliano roadmap.
If Dedaliano adds a long-term GPU program, the sensible order is:
WebGPU renderer and result visualizationGPU postprocessing kernelsIterative linear solver research on CPU firstGPU acceleration for iterative solver kernelsMaybe batched shell element kernelsMaybe matrix-free structural operatorsOnly much later, if justified: GPU sparse direct factorization
The strongest GPU research path for Dedaliano is:
- keep direct sparse solves on CPU for now
- improve the renderer and visualization pipeline on WebGPU
- explore GPU acceleration later through iterative methods and matrix-free/operator-style workflows
That is a much more realistic and higher-ROI trajectory than trying to port the current sparse direct solver architecture to GPU as-is.