Use preferences to switch between SIMD and KernelAbstractions by vchuravy · Pull Request #133 · WaterLily-jl/WaterLily.jl

vchuravy · 2024-06-20T17:33:55Z

I was experimenting with using PrecompileTools on WaterLily, and the choice to dispatch to the SIMD
backend depending on the nthreads variable caused issues.

In current versions of Julia nthreads is no longer constant.
If someone precompiles code nthreads == 1 in the precompilation process, thus exercising the wrong code path.

Opening this as a draft for now to solicit feedback. One probably would need to change the tests such that both code-paths are tested.

b-fg · 2024-06-21T08:36:30Z

Thanks for catching that. I was aware that nthreads==1 during precompilation was problematic, but during execution it was working as intended. Using Preferences seems like a nice workaround. I will do some tests and integrate it.

Also, not specifying the workgroup size did not yield to noticeable performance increase compared to 64 in the past (iirc). Has something changed in KA related to this? Is it anyways the recommended guideline to setup kernels?

vchuravy · 2024-06-21T16:22:38Z

Is it anyways the recommended guideline to setup kernels?

It is a bit tricky between CPU and GPU. Right now the the KA backend on the CPU is rather slow since the basecase size is small. The CPU does much better with larger basecases. Now we don't have a way to calculate that basecase automatically so we use 1024 on the CPU as a default.

On the GPU a static basecase is nice since it allows for some of the index integer operations to be optimized away.

b-fg · 2024-06-21T19:41:30Z

I did some preliminary benchmarks with different mesh sizes N=2^(3*p) using this PR. Overall, it seems that the current PR is a bit slower than master on GPU. The only main difference is that the workgroupsize is now not specified. Results are below, where the commits (which are wrongly tagged) refer to 33933fd==PR and a8a2506==master:

Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 6
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│     GPU │   33933fd │ 1.10.2 │   Float32 │     3028166 │   1.41 │     0.58 │     1.00 │
│     GPU │   a8a2506 │ 1.10.2 │   Float32 │     2672719 │   2.11 │     0.55 │     1.05 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 7
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│     GPU │   33933fd │ 1.10.2 │   Float32 │     2671525 │   1.42 │     0.79 │     1.00 │
│     GPU │   a8a2506 │ 1.10.2 │   Float32 │     2339494 │   1.41 │     0.78 │     1.01 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 8
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│     GPU │   33933fd │ 1.10.2 │   Float32 │     2085611 │   0.38 │     2.98 │     1.00 │
│     GPU │   a8a2506 │ 1.10.2 │   Float32 │     1816307 │   0.25 │     2.79 │     1.07 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 9
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│     GPU │   33933fd │ 1.10.2 │   Float32 │     2160883 │   0.08 │    21.20 │     1.00 │
│     GPU │   a8a2506 │ 1.10.2 │   Float32 │     1798143 │   0.05 │    19.42 │     1.09 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘

vchuravy · 2024-06-21T20:03:27Z

It would be interesting to use CUDA.@profile to see if the kernel slowed down or the "auto-tunning" adds that overhead

weymouth · 2024-07-22T14:45:16Z

On my laptop GPU, I found no regression with this PR. In fact a very small speed up:

TGV (b01cdce is this PR, 5c78c37 is this PR with 64 workgroup size, f38bea4 is master)

▶ log2p = 6
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│     GPU │   b01cdce │ 1.10.0 │   Float32 │     3166654 │   0.38 │     2.28 │     1.00 │
│     GPU │   5c78c37 │ 1.10.0 │   Float32 │     2745665 │   0.58 │     2.87 │     0.80 │
│     GPU │   f38bea4 │ 1.10.0 │   Float32 │     2799117 │   0.66 │     2.37 │     0.96 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 7
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│     GPU │   b01cdce │ 1.10.0 │   Float32 │     2787354 │   0.12 │     7.82 │     1.00 │
│     GPU │   5c78c37 │ 1.10.0 │   Float32 │     2394736 │   0.19 │     7.87 │     0.99 │
│     GPU │   f38bea4 │ 1.10.0 │   Float32 │     2442026 │   0.15 │     7.80 │     1.00 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘

Jelly

▶ log2p = 5
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│     GPU │   b01cdce │ 1.10.0 │   Float32 │     2976119 │   0.53 │     1.82 │     1.00 │
│     GPU │   5c78c37 │ 1.10.0 │   Float32 │     2602224 │   0.46 │     2.01 │     0.91 │
│     GPU │   f38bea4 │ 1.10.0 │   Float32 │     2652446 │   0.47 │     1.97 │     0.93 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 6
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│     GPU │   b01cdce │ 1.10.0 │   Float32 │     3166982 │   0.24 │     5.45 │     1.00 │
│     GPU │   5c78c37 │ 1.10.0 │   Float32 │     2747379 │   0.17 │     5.74 │     0.95 │
│     GPU │   f38bea4 │ 1.10.0 │   Float32 │     2801011 │   0.15 │     5.75 │     0.95 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘

b-fg · 2024-08-01T11:01:58Z

I did some more benchmarks after a local merge of master with this PR. All looks good except removing the workgroup size as we had it it before (64). Here 9b6ca77 is this PR merged with master and no workgroup size, and backends is this PR merged with master and with workgroup size 64. There is something going on for the CPU backend of KA when not specifying the workgroup size, making it slower than the serial SIMD version. This is with latest KA version (0.9.22).

Benchmarks

Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 6
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│  CPUx01 │  backends │ 1.10.4 │   Float32 │       78733 │   0.00 │    10.37 │           395.64 │     1.00 │
│  CPUx01 │   9b6ca77 │ 1.10.4 │   Float32 │       78733 │   0.00 │    10.26 │           391.32 │     1.01 │
│  CPUx01 │    master │ 1.10.4 │   Float32 │       78733 │   0.00 │    10.33 │           394.04 │     1.00 │
│  CPUx04 │  backends │ 1.10.4 │   Float32 │     2223302 │   0.00 │     3.31 │           126.28 │     3.13 │
│  CPUx04 │   9b6ca77 │ 1.10.4 │   Float32 │     2187731 │   0.00 │    17.90 │           682.65 │     0.58 │
│  CPUx04 │    master │ 1.10.4 │   Float32 │     2274514 │   0.00 │     3.14 │           119.75 │     3.30 │
│  CPUx08 │  backends │ 1.10.4 │   Float32 │     3503858 │   0.00 │     3.22 │           122.65 │     3.23 │
│  CPUx08 │   9b6ca77 │ 1.10.4 │   Float32 │     3465887 │   0.00 │    16.89 │           644.44 │     0.61 │
│  CPUx08 │    master │ 1.10.4 │   Float32 │     3555070 │   0.00 │     3.37 │           128.56 │     3.08 │
│    CUDA │  backends │ 1.10.4 │   Float32 │     2619999 │   0.00 │     0.66 │            25.09 │    15.77 │
│    CUDA │   9b6ca77 │ 1.10.4 │   Float32 │     3030802 │   0.00 │     0.65 │            24.62 │    16.07 │
│    CUDA │    master │ 1.10.4 │   Float32 │     2671213 │   0.00 │     0.63 │            24.02 │    16.47 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 7
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│  CPUx01 │  backends │ 1.10.4 │   Float32 │       70606 │   0.00 │    58.01 │           276.59 │     1.00 │
│  CPUx01 │   9b6ca77 │ 1.10.4 │   Float32 │       70606 │   0.00 │    57.53 │           274.34 │     1.01 │
│  CPUx01 │    master │ 1.10.4 │   Float32 │       70606 │   0.00 │    73.38 │           349.91 │     0.79 │
│  CPUx04 │  backends │ 1.10.4 │   Float32 │     1976782 │   0.00 │    18.52 │            88.29 │     3.13 │
│  CPUx04 │   9b6ca77 │ 1.10.4 │   Float32 │     1945182 │   0.00 │    66.66 │           317.85 │     0.87 │
│  CPUx04 │    master │ 1.10.4 │   Float32 │     2021882 │   0.00 │    18.50 │            88.21 │     3.14 │
│  CPUx08 │  backends │ 1.10.4 │   Float32 │     3114382 │   0.00 │    20.32 │            96.90 │     2.85 │
│  CPUx08 │   9b6ca77 │ 1.10.4 │   Float32 │     3082782 │   0.00 │    63.37 │           302.19 │     0.92 │
│  CPUx08 │    master │ 1.10.4 │   Float32 │     3159482 │   0.00 │    19.00 │            90.61 │     3.05 │
│    CUDA │  backends │ 1.10.4 │   Float32 │     2301906 │   0.00 │     3.11 │            14.82 │    18.66 │
│    CUDA │   9b6ca77 │ 1.10.4 │   Float32 │     2683290 │   0.00 │     3.24 │            15.46 │    17.89 │
│    CUDA │    master │ 1.10.4 │   Float32 │     2347006 │   0.00 │     3.06 │            14.59 │    18.96 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
Benchmark environment: jelly sim_step! (max_steps=100)
▶ log2p = 5
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│  CPUx01 │  backends │ 1.10.4 │   Float32 │      196756 │   0.00 │     7.76 │           591.96 │     1.00 │
│  CPUx01 │   9b6ca77 │ 1.10.4 │   Float32 │      196756 │   0.00 │     7.74 │           590.32 │     1.00 │
│  CPUx01 │    master │ 1.10.4 │   Float32 │      196756 │   0.00 │     7.69 │           586.70 │     1.01 │
│  CPUx04 │  backends │ 1.10.4 │   Float32 │     5428158 │   1.55 │     4.46 │           340.46 │     1.74 │
│  CPUx04 │   9b6ca77 │ 1.10.4 │   Float32 │     5341074 │   0.27 │    26.74 │          2040.17 │     0.29 │
│  CPUx04 │    master │ 1.10.4 │   Float32 │     5549932 │   1.54 │     4.52 │           344.74 │     1.72 │
│  CPUx08 │  backends │ 1.10.4 │   Float32 │     8524182 │   2.39 │     4.92 │           375.75 │     1.58 │
│  CPUx08 │   9b6ca77 │ 1.10.4 │   Float32 │     8437098 │   0.21 │    26.96 │          2057.04 │     0.29 │
│  CPUx08 │    master │ 1.10.4 │   Float32 │     8645956 │   0.00 │     4.89 │           372.75 │     1.59 │
│    CUDA │  backends │ 1.10.4 │   Float32 │     6416700 │   0.00 │     1.46 │           111.76 │     5.30 │
│    CUDA │   9b6ca77 │ 1.10.4 │   Float32 │     7253464 │   0.00 │     1.48 │           112.95 │     5.24 │
│    CUDA │    master │ 1.10.4 │   Float32 │     6542128 │   0.00 │     1.45 │           110.63 │     5.35 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 6
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│  CPUx01 │  backends │ 1.10.4 │   Float32 │      230016 │   0.00 │    59.27 │           565.23 │     1.00 │
│  CPUx01 │   9b6ca77 │ 1.10.4 │   Float32 │      230016 │   0.00 │    58.88 │           561.49 │     1.01 │
│  CPUx01 │    master │ 1.10.4 │   Float32 │      230016 │   0.00 │    58.31 │           556.05 │     1.02 │
│  CPUx04 │  backends │ 1.10.4 │   Float32 │     6408293 │   0.38 │    20.22 │           192.87 │     2.93 │
│  CPUx04 │   9b6ca77 │ 1.10.4 │   Float32 │     6305699 │   0.07 │   121.32 │          1156.96 │     0.49 │
│  CPUx04 │    master │ 1.10.4 │   Float32 │     6552597 │   0.38 │    20.15 │           192.15 │     2.94 │
│  CPUx08 │  backends │ 1.10.4 │   Float32 │    10078241 │   0.69 │    21.92 │           209.08 │     2.70 │
│  CPUx08 │   9b6ca77 │ 1.10.4 │   Float32 │     9975647 │   0.12 │   121.08 │          1154.70 │     0.49 │
│  CPUx08 │    master │ 1.10.4 │   Float32 │    10222545 │   0.88 │    21.55 │           205.50 │     2.75 │
│    CUDA │  backends │ 1.10.4 │   Float32 │     7642918 │   0.00 │     4.69 │            44.77 │    12.63 │
│    CUDA │   9b6ca77 │ 1.10.4 │   Float32 │     8713607 │   0.00 │     4.73 │            45.10 │    12.53 │
│    CUDA │    master │ 1.10.4 │   Float32 │     7792785 │   0.00 │     4.70 │            44.78 │    12.62 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘

vchuravy · 2024-08-01T16:11:34Z

So the default workgroupsize for KA is 1024. With 64 you create a lot of small tasks, what is the typical ndrange you use?

b-fg · 2024-08-01T16:40:32Z

For example, the TGV case is a 3D case for which I tested domain sizes of 64^3 and 128^3. The arrays we use are then (64,64,64) and (64,64,64,3) (analogously for the 128^3 grid), which is the ndrange we typically pass into the kernel. Also, I am not sure I tested this PR before with multi-threading on the CPU backend... I think it was just on thje GPU (as reported previously).

vchuravy · 2024-08-02T07:54:26Z

Ah so you are getting perfectly sized blocks, by accident xD

You may want to use (64, 64) instead as the workgroup size.

b-fg · 2024-08-04T21:12:43Z

Sure, I will do some tests after my summer break. But does this mean that we cannot use the default workgrup size (as in this PR)? Could this be something to improve in KA, where it would try to automatically determine it based on ndrange?

vchuravy · 2024-08-05T05:24:05Z

Yeah I will need to improve this on the KA side

vchuravy · 2024-08-07T12:23:28Z

I just tagged a new KA version with the fix. This might remove the need for the SIMD variant entirely.

b-fg · 2024-08-21T10:02:02Z

I have tested the changes and while the results improve, it is still not there (again, 9b6ca77 is this PR). There might be something else going on but unsure what at the moment...

Benchmarks

Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 6
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│  CPUx01 │  backends │ 1.10.4 │   Float32 │       78733 │   0.00 │    10.42 │           397.53 │     1.00 │
│  CPUx01 │   9b6ca77 │ 1.10.4 │   Float32 │       78733 │   0.00 │    10.24 │           390.64 │     1.02 │
│  CPUx01 │    master │ 1.10.4 │   Float32 │       78733 │   0.00 │    10.29 │           392.71 │     1.01 │
│  CPUx04 │  backends │ 1.10.4 │   Float32 │     2223302 │   0.00 │     3.34 │           127.43 │     3.12 │
│  CPUx04 │   9b6ca77 │ 1.10.4 │   Float32 │     1993389 │   0.00 │     4.25 │           162.06 │     2.45 │
│  CPUx04 │    master │ 1.10.4 │   Float32 │     2274514 │   0.00 │     3.20 │           121.89 │     3.26 │
│  CPUx08 │  backends │ 1.10.4 │   Float32 │     3503858 │   0.00 │     3.24 │           123.48 │     3.22 │
│  CPUx08 │   9b6ca77 │ 1.10.4 │   Float32 │     2647077 │   0.00 │     4.41 │           168.25 │     2.36 │
│  CPUx08 │    master │ 1.10.4 │   Float32 │     3555070 │   0.00 │     3.32 │           126.53 │     3.14 │
│    CUDA │  backends │ 1.10.4 │   Float32 │     2621768 │   0.00 │     0.65 │            24.76 │    16.05 │
│    CUDA │   9b6ca77 │ 1.10.4 │   Float32 │     3026963 │   0.00 │     0.63 │            23.91 │    16.62 │
│    CUDA │    master │ 1.10.4 │   Float32 │     2671140 │   0.00 │     0.68 │            25.79 │    15.42 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 7
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│  CPUx01 │  backends │ 1.10.4 │   Float32 │       70606 │   0.00 │    58.85 │           280.64 │     1.00 │
│  CPUx01 │   9b6ca77 │ 1.10.4 │   Float32 │       70606 │   0.00 │    57.75 │           275.38 │     1.02 │
│  CPUx01 │    master │ 1.10.4 │   Float32 │       70606 │   0.00 │    57.62 │           274.74 │     1.02 │
│  CPUx04 │  backends │ 1.10.4 │   Float32 │     1976782 │   0.00 │    20.92 │            99.76 │     2.81 │
│  CPUx04 │   9b6ca77 │ 1.10.4 │   Float32 │     1819590 │   0.00 │    24.61 │           117.34 │     2.39 │
│  CPUx04 │    master │ 1.10.4 │   Float32 │     2021882 │   0.00 │    21.37 │           101.89 │     2.75 │
│  CPUx08 │  backends │ 1.10.4 │   Float32 │     3114382 │   0.00 │    19.24 │            91.74 │     3.06 │
│  CPUx08 │   9b6ca77 │ 1.10.4 │   Float32 │     2737306 │   0.00 │    25.67 │           122.38 │     2.29 │
│  CPUx08 │    master │ 1.10.4 │   Float32 │     3159482 │   0.00 │    22.54 │           107.47 │     2.61 │
│    CUDA │  backends │ 1.10.4 │   Float32 │     2303706 │   0.00 │     3.09 │            14.71 │    19.07 │
│    CUDA │   9b6ca77 │ 1.10.4 │   Float32 │     2680490 │   0.00 │     3.25 │            15.48 │    18.12 │
│    CUDA │    master │ 1.10.4 │   Float32 │     2347008 │   0.00 │     3.16 │            15.05 │    18.65 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
Benchmark environment: jelly sim_step! (max_steps=100)
▶ log2p = 5
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│  CPUx01 │  backends │ 1.10.4 │   Float32 │      196756 │   0.00 │     7.89 │           601.60 │     1.00 │
│  CPUx01 │   9b6ca77 │ 1.10.4 │   Float32 │      196756 │   0.00 │     7.71 │           588.41 │     1.02 │
│  CPUx01 │    master │ 1.10.4 │   Float32 │      196756 │   0.00 │     7.71 │           588.06 │     1.02 │
│  CPUx04 │  backends │ 1.10.4 │   Float32 │     5428158 │   1.59 │     4.54 │           346.53 │     1.74 │
│  CPUx04 │   9b6ca77 │ 1.10.4 │   Float32 │     3954012 │   0.00 │     4.35 │           331.67 │     1.81 │
│  CPUx04 │    master │ 1.10.4 │   Float32 │     5549932 │   1.49 │     4.73 │           361.22 │     1.67 │
│  CPUx08 │  backends │ 1.10.4 │   Float32 │     8524182 │   0.00 │     4.85 │           369.75 │     1.63 │
│  CPUx08 │   9b6ca77 │ 1.10.4 │   Float32 │     5237268 │   0.00 │     5.04 │           384.33 │     1.57 │
│  CPUx08 │    master │ 1.10.4 │   Float32 │     8645956 │   2.37 │     5.15 │           392.91 │     1.53 │
│    CUDA │  backends │ 1.10.4 │   Float32 │     6416699 │   0.00 │     1.45 │           110.72 │     5.43 │
│    CUDA │   9b6ca77 │ 1.10.4 │   Float32 │     7253470 │   0.00 │     1.47 │           112.14 │     5.36 │
│    CUDA │    master │ 1.10.4 │   Float32 │     6538380 │   0.00 │     1.48 │           112.78 │     5.33 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 6
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│  CPUx01 │  backends │ 1.10.4 │   Float32 │      230016 │   0.00 │    59.27 │           565.22 │     1.00 │
│  CPUx01 │   9b6ca77 │ 1.10.4 │   Float32 │      230016 │   0.00 │    58.71 │           559.87 │     1.01 │
│  CPUx01 │    master │ 1.10.4 │   Float32 │      230016 │   0.00 │    58.31 │           556.06 │     1.02 │
│  CPUx04 │  backends │ 1.10.4 │   Float32 │     6408293 │   0.37 │    20.44 │           194.94 │     2.90 │
│  CPUx04 │   9b6ca77 │ 1.10.4 │   Float32 │     5129556 │   0.00 │    29.92 │           285.32 │     1.98 │
│  CPUx04 │    master │ 1.10.4 │   Float32 │     6552597 │   0.39 │    21.95 │           209.37 │     2.70 │
│  CPUx08 │  backends │ 1.10.4 │   Float32 │    10078241 │   0.81 │    21.45 │           204.58 │     2.76 │
│  CPUx08 │   9b6ca77 │ 1.10.4 │   Float32 │     7343892 │   0.00 │    30.12 │           287.21 │     1.97 │
│  CPUx08 │    master │ 1.10.4 │   Float32 │    10222545 │   0.61 │    22.87 │           218.09 │     2.59 │
│    CUDA │  backends │ 1.10.4 │   Float32 │     7642918 │   0.00 │     4.69 │            44.69 │    12.65 │
│    CUDA │   9b6ca77 │ 1.10.4 │   Float32 │     8717354 │   0.00 │     4.77 │            45.46 │    12.43 │
│    CUDA │    master │ 1.10.4 │   Float32 │     7787222 │   0.00 │     4.70 │            44.86 │    12.60 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘

marinlauber · 2024-09-13T09:31:10Z

@b-fg Something I picked up out today. Currently, the maintest will run on CuArray only if you have nvcc installed. If you use the Julia CUDA compiler it doesn't install nvcc (at least not on my system). Same goes for AMG GPU I suppose.

It's kind of related to this I suppose, that's why I added it here.

b-fg · 2024-09-13T09:49:36Z

~~Ah but this is not a problem of this PR, but a problem of WaterLily-Benchmarks, right? If you open an issue there, we can iterate on it.~~

You mean these test lines , right?

WaterLily.jl/test/runtests.jl

Line 6 in c4d3500

_cuda = check_compiler("nvcc","release")

This is not related to this PR though. The problem is how to automatically detect that CUDA is available without loading CUDA.jl before, and come up with something that works for all OS.

When the SIMD backend is selected, the loop macro generates only the for loop, without a function wrapper. Also, dispatch based on number of threads has been removed, and now only the backend-specific kernel is compiled. The CI needs to be fixed for the allocations tests, which first needs to set the SIMD backend, and then re-run the tests.

b-fg · 2025-05-21T23:18:49Z

As a result of our conversation in #198, I thought it was about time to put this in use... So I have cleaned up a bit the Preferences.jl routines with the new API, and now @loop only compiles the kernel specific to the selected backend (dynamic dispatch based on number of threads has been removed).

The only thing left to figure out is the allocation tests, which currently I do not know how to update the CI so that the -t 1 tests are launched "twice", one to set the backend and one to compile it. Maybe we have to use a fabricated test command for this, instead of julia-actions/julia-runtest@v1.

codecov · 2025-05-21T23:28:44Z

Codecov Report

Attention: Patch coverage is 73.91304% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/WaterLily.jl	0.00%	3 Missing ⚠️
src/util.jl	85.00%	3 Missing ⚠️

Files with missing lines	Coverage Δ
src/WaterLily.jl	`65.90% <0.00%> (-24.10%)`	⬇️
src/util.jl	`80.34% <85.00%> (-1.24%)`	⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

b-fg · 2025-05-21T23:29:40Z

To do:

Automate CI with single thread and Preferences
Check again performance of not specifying workgroup size VS 64 (master) VS (64,64)
Get rid of warning when compiling for KA backend.
Implement function specialization for @loop kernels

…0pc slower than master...

b-fg · 2025-05-22T15:14:08Z

I have experimenting again with the workgroup size, and the best results are almost always with constant 64. Also, something that I do not understand, is that the single-thread benchmarks are ~40% slower when the @simd for... loop is not wrapped within a function. That is, the following implementation

@simd for $I ∈ $R
    @fastmath @inbounds $ex
end

is 40% slower than this one:

function $kern($(rep.(sym)...))
    @simd for $I ∈ $R
        @fastmath @inbounds $ex
    end
end
$kern($(sym...))

I do not understand why. In any case, I have reverted back to the other wrapped one, which is similar to what we have in master, but without using the dynamic dispatch based on threads number. And now I think we should really try the specialization for each argument as we discussed, @weymouth.

With the current PR state, these are the benchmarks:

Benchmark environment: tgv sim_step! (max_steps=100)                                                                                                                            
▶ log2p = 6
┌────────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │  backends │ 1.11.3 │   Float32 │        1807 │   0.00 │     4.18 │           159.45 │     1.00 │
│     CPUx01 │    master │ 1.11.3 │   Float32 │       80521 │   0.00 │     4.20 │           160.40 │     0.99 │
│     CPUx04 │  backends │ 1.11.3 │   Float32 │     2273557 │   0.00 │     3.44 │           131.21 │     1.22 │
│     CPUx04 │    master │ 1.11.3 │   Float32 │     2329993 │   0.00 │     2.98 │           113.74 │     1.40 │
│ GPU-NVIDIA │  backends │ 1.11.3 │   Float32 │     2933922 │   0.00 │     0.59 │            22.55 │     7.07 │
│ GPU-NVIDIA │    master │ 1.11.3 │   Float32 │     2978161 │   0.00 │     0.58 │            22.12 │     7.21 │
└────────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 7
┌────────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │  backends │ 1.11.3 │   Float32 │        1807 │   0.00 │    25.42 │           121.19 │     1.00 │
│     CPUx01 │    master │ 1.11.3 │   Float32 │       75316 │   0.00 │    25.94 │           123.70 │     0.98 │
│     CPUx04 │  backends │ 1.11.3 │   Float32 │     2116285 │   0.00 │    18.10 │            86.31 │     1.40 │
│     CPUx04 │    master │ 1.11.3 │   Float32 │     2168103 │   0.00 │    17.17 │            81.88 │     1.48 │
│ GPU-NVIDIA │  backends │ 1.11.3 │   Float32 │     2680201 │   0.00 │     3.01 │            14.37 │     8.43 │
│ GPU-NVIDIA │    master │ 1.11.3 │   Float32 │     2723973 │   0.00 │     3.01 │            14.35 │     8.45 │
└────────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
Benchmark environment: jelly sim_step! (max_steps=100)
▶ log2p = 5
┌────────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │  backends │ 1.11.3 │   Float32 │        7107 │   0.00 │     3.77 │           287.55 │     1.00 │
│     CPUx01 │    master │ 1.11.3 │   Float32 │      161463 │   0.00 │     3.78 │           288.53 │     1.00 │
│     CPUx04 │  backends │ 1.11.3 │   Float32 │     4357685 │   0.55 │     3.50 │           267.29 │     1.08 │
│     CPUx04 │    master │ 1.11.3 │   Float32 │     4456091 │   0.57 │     3.46 │           264.08 │     1.09 │
│ GPU-NVIDIA │  backends │ 1.11.3 │   Float32 │     5627919 │   1.23 │     1.04 │            79.02 │     3.64 │
│ GPU-NVIDIA │    master │ 1.11.3 │   Float32 │     5705257 │   1.31 │     1.09 │            82.92 │     3.47 │
└────────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 6
┌────────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │  backends │ 1.11.3 │   Float32 │        8307 │   0.00 │    26.63 │           253.92 │     1.00 │
│     CPUx01 │    master │ 1.11.3 │   Float32 │      208459 │   0.00 │    26.98 │           257.32 │     0.99 │
│     CPUx04 │  backends │ 1.11.3 │   Float32 │     5712303 │   0.16 │    17.73 │           169.11 │     1.50 │
│     CPUx04 │    master │ 1.11.3 │   Float32 │     5839681 │   0.16 │    18.12 │           172.81 │     1.47 │
│ GPU-NVIDIA │  backends │ 1.11.3 │   Float32 │     7596622 │   0.38 │     3.82 │            36.46 │     6.96 │
│ GPU-NVIDIA │    master │ 1.11.3 │   Float32 │     7680551 │   0.40 │     3.81 │            36.31 │     6.99 │
└────────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘

b-fg · 2025-05-22T18:45:58Z

New CI working nicely with LocalPreferences!

b-fg · 2025-05-22T18:52:24Z

@vchuravy any ideas on how to bypass the warning of using KA in single thread (instead of SIMD) during precompilation? Otherwise, since we use KA as default backend and precompilation typically happens with a single thread, the warning is always popping up, which is not ideal. I am not sure how this can be addressed.

…op macro.

b-fg · 2025-05-22T21:12:07Z

The @loop macro now implements automatic specialization for the function wrapping the kernels, so that all arguments are included such as

function kern(a::A,b::B,c::C,...) where {A,B,C,...}
    ...
end

Below is an example

Details

@macroexpand WaterLily.@loop a[I] += b[I] over I in CartesianIndices(a)
quote
    #= /home/b-fg/Workspace/tudelft1/WaterLily.jl/src/util.jl:155 =#
    function var"##kern#242"(a::DBGS, b::FHGH) where {DBGS, FHGH}
        #= /home/b-fg/Workspace/tudelft1/WaterLily.jl/src/util.jl:155 =#
        #= /home/b-fg/Workspace/tudelft1/WaterLily.jl/src/util.jl:156 =#
        begin
            #= simdloop.jl:69 =#
            let var"##r#244" = CartesianIndices(a)
                #= simdloop.jl:70 =#
                for var"##i#245" = Base.simd_outer_range(var"##r#244")
                    #= simdloop.jl:71 =#
                    let var"##n#246" = Base.simd_inner_length(var"##r#244", var"##i#245")
                        #= simdloop.jl:72 =#
                        if zero(var"##n#246") < var"##n#246"
                            #= simdloop.jl:74 =#
                            let var"##i#247" = zero(var"##n#246")
                                #= simdloop.jl:75 =#
                                while var"##i#247" < var"##n#246"
                                    #= simdloop.jl:76 =#
                                    local I = Base.simd_index(var"##r#244", var"##i#245", var"##i#247")
                                    #= simdloop.jl:77 =#
                                    begin
                                        #= /home/b-fg/Workspace/tudelft1/WaterLily.jl/src/util.jl:157 =#
                                        begin
                                            $(Expr(:inbounds, true))
                                            local var"#4#val" = (a[I] += b[I])
                                            $(Expr(:inbounds, :pop))
                                            var"#4#val"
                                        end
                                        #= /home/b-fg/Workspace/tudelft1/WaterLily.jl/src/util.jl:158 =#
                                    end
                                    #= simdloop.jl:78 =#
                                    var"##i#247" += 1
                                    #= simdloop.jl:79 =#
                                    $(Expr(:loopinfo, Symbol("julia.simdloop"), nothing))
                                    #= simdloop.jl:80 =#
                                end
                            end
                        end
                    end
                    #= simdloop.jl:84 =#
                end
            end
            #= simdloop.jl:86 =#
            nothing
        end
    end
    #= /home/b-fg/Workspace/tudelft1/WaterLily.jl/src/util.jl:160 =#
    var"##kern#242"(a, b)
end

b-fg · 2025-05-22T21:31:45Z

Consistent 1-2% speedup on all backends, and we will (hopefully) be able to specialize kernels passing a function. So this ticks all the boxes :)

Benchmarks

Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 6
┌────────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │  backends │ 1.11.3 │   Float32 │        1807 │   0.00 │     4.11 │           156.91 │     1.00 │
│     CPUx01 │    master │ 1.11.3 │   Float32 │       80521 │   0.00 │     4.20 │           160.40 │     0.98 │
│     CPUx04 │  backends │ 1.11.3 │   Float32 │     2273557 │   0.00 │     2.94 │           112.31 │     1.40 │
│     CPUx04 │    master │ 1.11.3 │   Float32 │     2329993 │   0.00 │     2.98 │           113.74 │     1.38 │
│ GPU-NVIDIA │  backends │ 1.11.3 │   Float32 │     2933921 │   0.00 │     0.58 │            22.19 │     7.07 │
│ GPU-NVIDIA │    master │ 1.11.3 │   Float32 │     2978161 │   0.00 │     0.58 │            22.12 │     7.09 │
└────────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 7
┌────────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │  backends │ 1.11.3 │   Float32 │        1807 │   0.00 │    25.42 │           121.23 │     1.00 │
│     CPUx01 │    master │ 1.11.3 │   Float32 │       75316 │   0.00 │    25.94 │           123.70 │     0.98 │
│     CPUx04 │  backends │ 1.11.3 │   Float32 │     2116285 │   0.00 │    17.01 │            81.13 │     1.49 │
│     CPUx04 │    master │ 1.11.3 │   Float32 │     2168103 │   0.00 │    17.17 │            81.88 │     1.48 │
│ GPU-NVIDIA │  backends │ 1.11.3 │   Float32 │     2680200 │   0.00 │     2.98 │            14.19 │     8.54 │
│ GPU-NVIDIA │    master │ 1.11.3 │   Float32 │     2723973 │   0.00 │     3.01 │            14.35 │     8.45 │
└────────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
Benchmark environment: jelly sim_step! (max_steps=100)
▶ log2p = 5
┌────────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │  backends │ 1.11.3 │   Float32 │        7107 │   0.00 │     3.75 │           285.84 │     1.00 │
│     CPUx01 │    master │ 1.11.3 │   Float32 │      161463 │   0.00 │     3.78 │           288.53 │     0.99 │
│     CPUx04 │  backends │ 1.11.3 │   Float32 │     4357685 │   0.60 │     3.42 │           261.09 │     1.09 │
│     CPUx04 │    master │ 1.11.3 │   Float32 │     4456091 │   0.57 │     3.46 │           264.08 │     1.08 │
│ GPU-NVIDIA │  backends │ 1.11.3 │   Float32 │     5627921 │   0.00 │     1.02 │            77.65 │     3.68 │
│ GPU-NVIDIA │    master │ 1.11.3 │   Float32 │     5705257 │   1.31 │     1.09 │            82.92 │     3.45 │
└────────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 6
┌────────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │  backends │ 1.11.3 │   Float32 │        8307 │   0.00 │    26.49 │           252.60 │     1.00 │
│     CPUx01 │    master │ 1.11.3 │   Float32 │      208459 │   0.00 │    26.98 │           257.32 │     0.98 │
│     CPUx04 │  backends │ 1.11.3 │   Float32 │     5712303 │   0.15 │    17.84 │           170.11 │     1.48 │
│     CPUx04 │    master │ 1.11.3 │   Float32 │     5839681 │   0.16 │    18.12 │           172.81 │     1.46 │
│ GPU-NVIDIA │  backends │ 1.11.3 │   Float32 │     7604281 │   0.52 │     3.80 │            36.19 │     6.98 │
│ GPU-NVIDIA │    master │ 1.11.3 │   Float32 │     7680551 │   0.40 │     3.81 │            36.31 │     6.96 │
└────────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘

Details

b-fg · 2025-05-22T21:33:08Z

We should address the warning issue, and this PR is good to go!

marinlauber · 2025-05-23T06:20:17Z

Consistent 1-2% speedup on all backends, and we will (hopefully) be able to specialize kernels passing a function. So this ticks all the boxes :)

Nice! It's interesting that allocations are down for single threads but not for multiple threads.

b-fg · 2025-05-23T08:26:33Z

Yes! Removing the dynamic dispatch based on number of threads, and instead just compiling the SIMD kernel based on LocalPreferences, brought down allocations significantly for single thread. Then the general small speedup gain resulted from specializing the wrapper function.

…nd not for every ext module when loading.

vchuravy added 4 commits June 20, 2024 13:31

Use preferences to switch between SIMD and KernelAbstractions

1f91eda

fixup! Use preferences to switch between SIMD and KernelAbstractions

4b7a846

fixup! Use preferences to switch between SIMD and KernelAbstractions

e77bca4

don't limit workgroups to 64

b01cdce

b-fg mentioned this pull request Jul 23, 2024

AMDGPU downgrades Waterlily #145

Closed

vchuravy mentioned this pull request Aug 2, 2024

Default CPU workgroupsize can be inadequat for higher-dimensionsal kernels JuliaGPU/KernelAbstractions.jl#499

Closed

b-fg added 2 commits May 22, 2025 00:26

Merge branch 'master' into vc/backends

5ecb979

b-fg marked this pull request as ready for review May 21, 2025 23:19

b-fg added 2 commits May 22, 2025 01:21

Small bug fix.

2369279

Cleaned up -t 1 warning message.

2c629c2

testing 64,64 workgroup size

5f17ede

b-fg added 2 commits May 22, 2025 02:26

Workgroup size is 64, like master.

f1621be

Wrapped SIMD kernels in a fuction too, as single-threaded test were 4…

6f9aa99

…0pc slower than master...

b-fg mentioned this pull request May 22, 2025

Turbulence utils: SGS model and on-the-fly temporal averaging #198

Merged

b-fg added 2 commits May 22, 2025 20:27

First attempt at adjusting CI to work with LocalPreferences

fb85fd1

CI fix.

2c7c41e

Added automatic specialization for kernel functions generated with lo…

4ea8e73

…op macro.

b-fg added 6 commits May 22, 2025 23:40

Merge branch 'master' into vc/backends

85f0855

Cleaned up CI.

eea319b

Adding pkg Random in tests.

1239610

Trying to fix Random pkg error in CI.

a50deb1

Bumping CI to Julia 1.11, trying to fix Random pkg error during tests

d2b3ade

Removed x86 architecture from CI.

87cddb8

b-fg force-pushed the vc/backends branch from 1f1e96f to 87cddb8 Compare May 23, 2025 08:39

b-fg added 2 commits May 24, 2025 16:02

Moved nthreads check outside of __init__ so it is only checked once a…

f8633ea

…nd not for every ext module when loading.

Added test for new routines used in automatic kernel generation.

7da0526

b-fg merged commit 3b304d0 into WaterLily-jl:master May 24, 2025
7 of 8 checks passed

weymouth mentioned this pull request Jun 3, 2025

Some dependencies are breaking compatibility with other packages #222

Closed

Conversation

vchuravy commented Jun 20, 2024

Uh oh!

b-fg commented Jun 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vchuravy commented Jun 21, 2024

Uh oh!

b-fg commented Jun 21, 2024

Uh oh!

vchuravy commented Jun 21, 2024

Uh oh!

weymouth commented Jul 22, 2024

Uh oh!

b-fg commented Aug 1, 2024

Uh oh!

vchuravy commented Aug 1, 2024

Uh oh!

b-fg commented Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vchuravy commented Aug 2, 2024

Uh oh!

b-fg commented Aug 4, 2024

Uh oh!

vchuravy commented Aug 5, 2024

Uh oh!

vchuravy commented Aug 7, 2024

Uh oh!

b-fg commented Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marinlauber commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-fg commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-fg commented May 21, 2025

Uh oh!

codecov bot commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

b-fg commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-fg commented May 22, 2025

Uh oh!

b-fg commented May 22, 2025

Uh oh!

b-fg commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-fg commented May 22, 2025

Uh oh!

b-fg commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-fg commented May 22, 2025

Uh oh!

marinlauber commented May 23, 2025

Uh oh!

b-fg commented May 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

b-fg commented Jun 21, 2024 •

edited

Loading

b-fg commented Aug 1, 2024 •

edited

Loading

b-fg commented Aug 21, 2024 •

edited

Loading

marinlauber commented Sep 13, 2024 •

edited

Loading

b-fg commented Sep 13, 2024 •

edited

Loading

codecov bot commented May 21, 2025 •

edited

Loading

b-fg commented May 21, 2025 •

edited

Loading

b-fg commented May 22, 2025 •

edited

Loading

b-fg commented May 22, 2025 •

edited

Loading