[cufinufft] fold_rescale/interval fails in for float32 inputs on large 1D (~1e8) uniform grids

While working on the review for PR #663, I came into a segfault bug occurring in index calculation from the combination of `fold_rescale` with `interval`. This happens roughly when `N` has more digits than a floating point number can accurately represent (~7 digits for `float32`, ~15 digits for `float64`). So the `segfault` is really only practically valid in `1D` when using `float32`, though I think the issue is more broad than that. This bug is not only present in the new "output driven" method, though doesn't produce a segfault in the other methods, just spreads to the wrong(?) points. Looking at the `spread_1d_nuptsdriven` code (all comments are mine)

```c++
    const auto x_rescaled     = fold_rescale(x[idxnupts[i]], nf1); // float32 of O(nf1), possible loss of precision)
    const auto cnow           = c[idxnupts[i]];
    const auto [xstart, xend] = interval(ns, x_rescaled); // last digits of xstart possibly wrong
    const T x1                = (T)xstart - x_rescaled;
    if constexpr (KEREVALMETH == 1)
      eval_kernel_vec_horner<T, ns>(ker1, x1, sigma);
    else
      eval_kernel_vec<T, ns>(ker1, x1, es_c, es_beta);

    for (auto xx = xstart; xx <= xend; xx++) {
      // segfault averted by wrapping all values of xx, though density spreads to the wrong gridpoints
      auto ix          = xx < 0 ? xx + nf1 : (xx > nf1 - 1 ? xx - nf1 : xx); 
      const T kervalue = ker1[xx - xstart];
      atomicAdd(&fw[ix].x, cnow.x * kervalue);
      atomicAdd(&fw[ix].y, cnow.y * kervalue);
    }
```

Similar safety wrappers are written on every other `fold_rescale` operation to prevent illegal memory address access and properly handle periodic boundary issues. I think it's reasonable to ask if this is actually a bug or a feature, since ~100 million grid points is... a lot, and I'm unsure if spreading to slightly the wrong(?) grid points will do anything actually noticeable. That said, as written, what is normally the infrequent `off-by-one` errors in spreading can be considerably amplified to `off-by-100` errors if the user requests large uniform grids for whatever reason. 

Remediation:
* upcast to `double`?
  * I think this is almost equally wrong for low `N` -- it's just adding random digits to our `float`, but the added intermediate precision should make index-finding slightly more accurate. If my thinking is clear, it should at least fix the issue where large `N` leads to _very_ wrong index calculation.
  * I have verified that somewhat frequently, even for low `N`, upcasting to `double`, not surprisingly, does give different index calculations 
* Tell the user "don't do that" -- probably mathematically justified, but I'm not the expert here
* I don't know. Just need to bring this to discussion since it will give negative indices in the output-driven method as currently written, and it's worth considering how to handle these issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cufinufft] fold_rescale/interval fails in for float32 inputs on large 1D (~1e8) uniform grids #701

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[cufinufft] fold_rescale/interval fails in for float32 inputs on large 1D (~1e8) uniform grids #701

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions