Fixed-point arithmetic (nfixed) improvements

The best way to do bulk few-limb arithmetic on real numbers seems to be to convert to fixed-point form for things like matrix multiplication, FFT and polynomial multiplication. This generally means using somewhat higher precision than floating-point for equivalent accuracy, but being able to skip all the shifts, normalisations and branches of floating-point seems to make up for this drawback. Additions are notably cheaper in fixed-point, and this is especially important for subclassical algorithms (Karatsuba, Strassen, ...) where the idea is to trade multiplications for additions.

Here are some things to improve/fix:

* [Probably wontfix] One minor annoyance is how to represent signs. Twos complement would be better than sign+magnitude for adding and subtracting, as these operations then become completely branchless, but then multiplications become more complex and ultimately I don't think you gain anything. Currently a whole extra limb is used for the sign bit; this isn't ideal either, and one could steal a single low or high bit of the fraction limbs instead, but then you need some extra masking operations which might not be worth the trouble.

* Right now additions and subtractions are inlined up to n = 8 limbs. For larger n inlining additions everywhere might be overkill, but we should investigate whether using some hardcoded addition functions for various lengths > 8 is significantly faster than ``mpn_add_n``/``mpn_sub_n``.

* For multiplying, it would suffice to compute less accurate high products with O(n) error instead ~2 ulp error most of the time, so it would be useful to have hardcoded and basecase assembly for these sloppier variants of mulhigh up to a few limbs.

* As usual, dot products are the workhorse operation. For now, I have implemented ``_nfixed_dot_2``, ``_nfixed_dot_3`` and ``_nfixed_dot_4`` in C using ``umul_ppmm`` / ``add_ssaaaa`` / ``add_sssaaaaaa`` where I break up the sums so that there are no long carry chains. Full assembly implementations interleaving the multiplications and additions optimally should be even faster though. For the smallest ``n`` one might want the whole dot product in assembly; for larger ``n`` some ``mpn_addmulhigh_n`` type functions might suffice.

* ~~One has to scale inputs to $(-\varepsilon,\varepsilon)$ in such a way that no intermediate result escapes $(-1,1)$. I have not rigorously proved that this scaling is correct in the algorithms implemented so far, which of course will be absolutely necessary for eventual use by ``arb`` etc. Ideally, the bound should also not just be proven, but also tight, so that one wastes no precision. In some cases, it would be more efficient to use an extra limb for the output to store carry-out into an integer part, but that is not implemented. Similarly: proving error bounds.~~ Done, but not published.

* The most important and also most difficult problem: how to convert non-uniform floating-point matrices/polynomials to fixed-point to achieve full entrywise accuracy. Currently this is done very pessimistically by increasing the precision, avoiding fixed-point completely when things are too poorly scaled (including when any zeros are present). It is clearly possible to do much more clever things.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixed-point arithmetic (nfixed) improvements #2065

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fixed-point arithmetic (nfixed) improvements #2065

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions