Fix worktree counting on Julia 1.13 by tecosaur · Pull Request #64 · JuliaSIMD/ThreadingUtilities.jl

tecosaur · 2026-05-02T13:26:18Z

Rather trivial. Fixes #63.

Can confirm that all tests, and Polyester.jl tests, now pass on 1.13-rc1.

codecov · 2026-05-02T13:30:24Z

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 95.20%. Comparing base (cb9cf25) to head (6a78229).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
src/threadtasks.jl	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #64      +/-   ##
==========================================
- Coverage   99.20%   95.20%   -4.01%     
==========================================
  Files           4        4              
  Lines         126      125       -1     
==========================================
- Hits          125      119       -6     
- Misses          1        6       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tecosaur · 2026-05-02T13:37:54Z

The invalidations check itself seems broken, rather than anything to do with this change.

tecosaur · 2026-05-25T06:56:12Z

@oscardssmith you seem like you might be a good person to poke about this.

ChrisRackauckas

Verified: 1.13+ branch uses Base.Workqueues() which returns the current thread's IntrusiveLinkedListSynchronized{Task}, and Base.Workqueues[tidp1] continues to work via OncePerThread's getindex. CI green on nightly across linux/macos/windows including aarch64.

ThreadingUtilities 0.5.6 (JuliaSIMD/ThreadingUtilities.jl#64) fixes the Julia 1.13+ OncePerThread MethodError in wake_thread! that was causing every pre/nightly job to red-flag part1 and part4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@test

* Unbreak Apple ARM tests that now pass Several `@test_broken` / `@test_skip` gates on Apple ARM (M-series) no longer apply with current LoopVectorization and the VectorizationBase nested-W=1 `_vstore_unroll!` fix. - `condstore!` masked-store tests in `ifelsemasks.jl` (lines ~626-655) now produce matching results on Apple ARM — drop the Apple branch and test unconditionally for both Float32 and Float64. - `Bernoulli_logitavx`/`Bernoulli_logit_avx` with `Vector{Bool}` and an `Int` α (`ifelsemasks.jl` line ~736) was `@test_skip`-ed but actually passes — convert to `@test`. - Issue #543 W=1 nested VecUnroll store test in `staticsize.jl` was `@test_skip`-ed for v=1 on Apple ARM; with the VectorizationBase fix it now passes for all v=1..4, n=2..8. The remaining ARM-gated breakage in `ifelsemasks.jl` (Bernoulli with a `BitVector` mask + Float64/Int α at lines ~715-722) and the `tullio_issue_131` pattern in `shuffleloadstores.jl` are deeper SIMD issues left as `@test_broken` with TODOs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Unbreak BitVector Bernoulli_logit tests on Apple ARM With the companion VectorizationBase fix for dynamic-index BitArray loads with sub-byte alignment, `Bernoulli_logitavx` and `Bernoulli_logit_avx` now produce correct results for both `BitVector` and `Vector{Bool}` masks on Apple M-series. The Apple-aarch64 `@test_skip` / `@test_broken` branches are dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix unroll-cleanup tail bound for strided loads (tullio_issue_131) `pointermax_index` builds the limit pointer that the unroll-cleanup termination check is compared against. The `sub > 0` branch already applies `incr` (when not statically known) and `stride` (when ≠ 1) to scale the loop length into a byte/element offset, but the `sub == 0` branch was pushing the raw `stophint` / `stopsym` straight through. For any strided load on the unrolled axis (e.g. `arr[2i, ...]`) the cleanup bound came out `stride×` too small, so the final tail iteration was skipped whenever `looplen mod (UF*W) != 0`. On Apple ARM with W=2 for Float64, this dropped the last `out_i` iteration for every odd `out_i ≥ 3` in the tullio_issue_131 shape grid, and analogously for Float32 with W=4. The cleanup never ran for the 1–3 trailing elements, leaving them at whatever the output array was initialized to. Confirmed correct after fix for all `(M, N) ∈ 4:24 × 2:8` on the tullio reproducer; `test/shuffleloadstores.jl` goes from 4255 pass / 686 broken to 4941 pass / 0 broken on Apple M-series. Drop the matching `@test_broken` gate and the `tullio_issue_131` comment in `test/shuffleloadstores.jl`. Fixes #570. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Loosen condstore == to ≈; re-gate VB-dependent tests until VB#127 release Two CI regressions on the previous commits: 1. `condstore!` tests in `ifelsemasks.jl` (lines 626-637) use `==` to compare a SIMD-masked-store result against the scalar reference. On Apple ARM the two paths can differ by a 1-ULP rounding even though `@show`-printed values look identical (the original gate predates that observation). Switch to `≈` — the test still catches anything meaningful, just not artifacts of operation reordering. 2. The BitVector `Bernoulli_logit{,_}avx` tests in `ifelsemasks.jl`, the `Vector{Bool}` + Int α variants in the same block, and the W=1 nested-VecUnroll Issue #543 testset in `staticsize.jl` all depend on the JuliaSIMD/VectorizationBase.jl#127 fixes being available at runtime. That PR isn't tagged yet, so CI's stock VectorizationBase doesn't have it and the tests fail. Restore the `Sys.ARCH === :aarch64 && Sys.isapple()` gate (as `@test_broken` / `@test_skip`) with a comment pointing at VB#127. Once that release lands and LV's compat is bumped, the branches can be dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Use @test_skip for BitVector Bernoulli gates (Julia-version dependent) `@test_broken` errors on "Unexpected Pass", which makes the BitVector + Int α Bernoulli test fail in Julia LTS macOS aarch64 CI even though the test happens to give the correct result there. The underlying bug (VectorizationBase BitVector load misalignment, fixed in VB#127) is present in some configurations but not others — Julia 1.10's older LLVM appears to dodge it for the test inputs in question. Switch to `@test_skip` so the gate is loose either way: when the underlying bug bites, the test is skipped; when it doesn't, no error. After VB#127 is released and LV's compat is bumped, the entire branch can be dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Skip W=1 issue #543 test on all platforms (not just Apple aarch64) The nested W=1 VecUnroll store path is picked by LoopVectorization on different (arch, julia version) combinations than originally assumed — the Julia nightly x86_64 macOS CI also hit it, not just Apple aarch64. The fix is in JuliaSIMD/VectorizationBase.jl#127 and not yet in a tagged release, so skip the v == 1 sub-case on every platform until LV's VectorizationBase compat is bumped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Rerun CI on top of bumped downstream releases * Rerun CI on top of SLEEFPirates v0.6.46+ * Bump VectorizationBase compat to 0.21.74; drop @test_skip gates VectorizationBase v0.21.74 ships the two fixes JuliaSIMD/VectorizationBase.jl#127 added: - `_vstore_unroll!` for the nested W=1 (scalar lane) VecUnroll path, which `staticsize.jl`'s Issue #543 testset exercises with `v == 1`. - The dynamic-index BitArray load misalignment fix that `ifelsemasks.jl`'s `Bernoulli_logitavx`/`Bernoulli_logit_avx` with `BitVector` masks depends on. Bump LV's lower bound to `"0.21.74"` and drop the `@test_skip ... else @test ... end` branches I added while VB#127 was still in flight: - `test/ifelsemasks.jl`: Bernoulli BitVector + Int α (4 tests), Vector{Bool} + Int α (2 tests), BitVector + Float64 α (2 tests). - `test/staticsize.jl`: the `v == 1` Issue #543 sub-case (7 entries). Local sweep on Apple M-series with the dev'd v0.21.74: - `test/ifelsemasks.jl`: 435/435 pass (was 430/5 broken). - `test/staticsize.jl` Issue #543 testset: 84/84 pass (was 70/77). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Retrigger CI to pick up ThreadingUtilities 0.5.6 ThreadingUtilities 0.5.6 (JuliaSIMD/ThreadingUtilities.jl#64) fixes the Julia 1.13+ OncePerThread MethodError in wake_thread! that was causing every pre/nightly job to red-flag part1 and part4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Remove Invalidations CI workflow The SnoopCompileCore-based invalidations check has been broken since the SCPrettyTablesExt FieldError upstream regression and has been red across all recent PRs. The signal it produced (regressions in method-table invalidation count) hasn't been actionable for this repo; removing the workflow rather than keeping a perma-red check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tecosaur added 2 commits May 2, 2026 21:24

Fix worktree counting on Julia 1.13

0fa4bb3

Bump version to 0.5.6

6a78229

oscardssmith approved these changes May 25, 2026

View reviewed changes

ChrisRackauckas approved these changes May 29, 2026

View reviewed changes

ChrisRackauckas merged commit 666d81f into JuliaSIMD:main May 29, 2026
15 of 20 checks passed

ChrisRackauckas mentioned this pull request May 30, 2026

Remove LoopVectorization MacOS ARM grant listing SciML/sciml.ai#240

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix worktree counting on Julia 1.13#64

Fix worktree counting on Julia 1.13#64
ChrisRackauckas merged 2 commits into
JuliaSIMD:mainfrom
tecosaur:main

tecosaur commented May 2, 2026

Uh oh!

codecov Bot commented May 2, 2026

Uh oh!

tecosaur commented May 2, 2026

Uh oh!

tecosaur commented May 25, 2026

Uh oh!

ChrisRackauckas left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tecosaur commented May 2, 2026

Uh oh!

codecov Bot commented May 2, 2026

Codecov Report

Uh oh!

tecosaur commented May 2, 2026

Uh oh!

tecosaur commented May 25, 2026

Uh oh!

ChrisRackauckas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants