You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `RemoveDuplicateCastTransformer` fairly naively removed Cast nodes
from the graph without considering precision loss when using the same
`TypeGroup`. For instance, F64 -> F32 -> F64 would be optimised out of
the graph.
I also noticed that signedness was not accounted for, which is not
covered by any existing issue but is a problem. For example doing int ->
unsigned int -> int produces very different values for negative inputs
and so should not be optimised out
One could argue that we shouldn't be performing such cast elimination at
all (at least not in this transformer). The original scope might be well
restricted to only eliminating unnecessary casts from the
`InsertCastTransformer` and no others.
### Motivation and Context
This should fix#17565,
ttps://github.com//issues/9915 and
#8787.
// This is not a complete cast optimisation pass, and is more conservative than it could be.
287
+
// For instance, certain integral -> floating point casts could be optimised but this is left to an explicit cast optimisation pass.
288
+
289
+
// The comparison with "InsertedPrecisionFreeCast_" reflects cast nodes that are inserted by InsertCastTransformer.
290
+
// Such casts should not be considered as loss of precision - the inserted upcasts (f16 -> f32) and downcasts (f32 -> f16) are inserted to support kernels when on a CPU EP without F16 support.
291
+
auto src_type_group = GetTypeGroup(src_type);
292
+
auto dst_type_group = GetTypeGroup(dst_type);
293
+
if (Unknown == src_type_group || Unknown == dst_type_group) {
0 commit comments