test(TPU): relax numerical tests for TPUs + parallel TPU testing#2609
test(TPU): relax numerical tests for TPUs + parallel TPU testing#2609
Conversation
ab2b836 to
efd8382
Compare
|
@sbrantq can you take a look at the probprog tests for TPU https://github.com/EnzymeAD/Reactant.jl/actions/runs/22694756046/job/65798539411?pr=2609, most are likely just tolerance issues |
119ee80 to
9d5d92a
Compare
Looks like JAX used to generate reference results is falling back to CPU here, giving different rng results. I guess the fix would be only doing pointwise check when Reactant is using a CPU backend (will commit a fix shortly) |
8c2a9c5 to
0bf22d5
Compare
| end | ||
| else | ||
| if RunningOnTPU | ||
| @warn "Skipping MultiRotate test on TPU" |
There was a problem hiding this comment.
what crashes here, will concurrently investigate?
There was a problem hiding this comment.
fd1ad4f to
9eda871
Compare
50c6653 to
4da2f49
Compare
|
@sbrantq should we disable these on TPU? Seems like an XLA internal bug │ F0306 03:13:48.227292 15978 shape_util.cc:1214] Check failed: return_shape->IsTuple() Invalid index {0} for shape u32[2,2]{1,0}
│ *** Check failure stack trace: ***
│ @ 0x7a3282749064 absl::log_internal::LogMessage::SendToLog()
│ @ 0x7a3282749018 absl::log_internal::LogMessage::Flush()
│ @ 0x7a32820fa4d4 xla::ShapeUtil::GetSubshape()
│ @ 0x7a327527bdfb xla::ShapeTree<>::CopySubtreeFrom()
│ @ 0x7a327527a843 xla::HloReplicationAnalysis::ComputeHloReplicationOnComputation()
│ @ 0x7a327527b1da xla::HloReplicationAnalysis::ComputeHloReplicationOnComputation()
│ @ 0x7a3275279d60 xla::HloReplicationAnalysis::ComputeHloReplicationOnComputation()
│ @ 0x7a3275279d60 xla::HloReplicationAnalysis::ComputeHloReplicationOnComputation()
│ @ 0x7a3275279d60 xla::HloReplicationAnalysis::ComputeHloReplicationOnComputation()
│ @ 0x7a327527bf3a xla::HloReplicationAnalysis::ComputeHloReplication()
│ @ 0x7a327527e33b xla::HloReplicationAnalysis::Run()
│ @ 0x7a327527e245 xla::HloReplicationAnalysis::Run()
│ @ 0x7a327523ce75 xla::AllReduceSimplifier::RunImpl()
│ @ 0x7a327e7320e5 xla::HloPassPipeline::RunPassesInternal<>()
│ @ 0x7a327e73192f xla::HloPassPipeline::RunImpl()
│ @ 0x7a3273223fca xla::HloPassFix<>::RunOnChangedComputationsOnce()
│ @ 0x7a3273223681 xla::HloPassFix<>::RunToFixPoint()
│ @ 0x7a3273223260 xla::HloPassFix<>::RunImpl()
│ @ 0x7a327e7320e5 xla::HloPassPipeline::RunPassesInternal<>()
│ @ 0x7a327e73192f xla::HloPassPipeline::RunImpl()
│ @ 0x7a32732196c9 xla::jellyfish::(anonymous namespace)::HloOptimizeThroughLayoutAssignment()::$_0::operator()()
│ @ 0x7a327df59c93 absl::internal_any_invocable::LocalInvoker<>()
│ @ 0x7a32825a42d6 Thread::ThreadBody()
│ @ 0x7a39edd17aa4 (unknown)
│ @ 0x7a39edda4c6c (unknown)
│ https://symbolize.stripped_domain/r/?trace=7a3282749063,7a3282749017,7a32820fa4d3,7a327527bdfa,7a327527a842,7a327527b1d9,7a3275279d5f,7a3275279d5f,7a3275279d5f,7a327527bf39,7a327527e33a,7a327527e244,7a327523ce74,7a327e7320e4,7a327e73192e,7a3273223fc9,7a3273223680,7a327322325f,7a327e7320e4,7a327e73192e,7a32732196c8,7a327df59c92,7a32825a42d5,7a39edd17aa3,7a39edda4c6b&map=
│
│ [13014] signal 6 (-6): Aborted
│ in expression starting at /__w/Reactant.jl/Reactant.jl/test/probprog/mcmc_logpdf.jl:38 |
8dfb352 to
1ac45b8
Compare
Uh oh!
There was an error while loading. Please reload this page.