Skip to content

PPU: AltiVec uint->float conversion (vcfux) rounding possibly incorrect for 0x80000000+ #9417

Open
@Triang3l

Description

@Triang3l

The interpreter implementation of the VCFUX (uint -> float conversion followed by multiplication by 2^-n) AltiVec instruction, which converts the low 31 bits of the number as a signed integer and adds 2147483648.0f if it's >= 0x80000000 (because SSE only has sint -> float), appears to be rounding incorrectly when the source number is above 0x80000000.

I haven't done any tests on RPCS3 to check this, but we used a similar conversion method in Xenia, and found this issue by running a few tests on a real Xbox 360 — specifically using vcfux (with 0 exponent bias), 0xFFFDFF7E and 0xFFFCFF7D were converted to 0x4F7FFE00 and 0x4F7FFD00 this way, but on a real console, the instruction produces 0x4F7FFDFF and 0x4F7FFCFF instead. On the PS3, I think the behavior should be the same.

The issue with this method is that it performs rounding twice — first during the sint -> float conversion, and then during the 2147483648.0f addition.

Correct conversion

For example, here's what happens when converting the number 0b11000000000000000000000101000001. It has a value of 1.1000000000000000000000101000001 * 2^31 — at this precision, it's implicit 1, and 31 explicit bits of mantissa.

To correctly convert this number, the stored part of the mantissa needs to be rounded to 23 bits — we have 31 bits, so 8 bits need to be dropped:

10000000000000000000001✂️01000001

The part being cut off is ️01000001 — it's smaller than 10000000, thus it's unambiguously rounded down. The final result is:

1.10000000000000000000001 * 2^31

SInt > Float + 2147483648.0 conversion

Here's what happens when the conversion of the same number is done in two steps instead:

First, we're taking the low 31 bits of the number so it can be passed to the sint > float conversion instruction. Instead of 1.1000000000000000000000101000001 * 2^31, it becomes 0.1000000000000000000000101000001 * 2^31 as a result.

Normalizing this, and we get:

1.000000000000000000000101000001 * 2^30

— 30 bits of the mantissa explicitly stored now. This is rounded to the nearest to 23 bits — 7 bits need to be dropped:

00000000000000000000010✂️1000001

However, this time we have ️1000001 in the part that's cut off. It's bigger than ️1000000 — unambiguously rounding up. So after the sint > float conversion, we get:

1.00000000000000000000011 * 2^30

Now the part where it gets broken. We need to add 1.00000000000000000000011 * 2^30 to 2^31. Aligning the exponents and doing the addition, we get:

  0.10000000000000000000001|1 * 2^31
+ 1.00000000000000000000000   * 2^31
= 1.10000000000000000000001|1 * 2^31

Here we have 24 explicitly stored significand bits in the result — we need to round that to 23 to write the single-precision result. Again, doing rounding to nearest even:

10000000000000000000001✂️1

1✂️1 is odd + 0.5 — which needs to be tie-broken to the nearest even. In this case, the number is odd, and the nearest even to 1.5 is 2 — so we're rounding up. We end up with the following number:

1.10000000000000000000010 * 2^31

However, this is different than the correctly rounded result:

1.10000000000000000000001 * 2^31

Implementation

I have implemented the correct rounding in AVX (though easily portable to SSE, with slightly more instructions) in this Xenia commit: xenia-project/xenia@5c47a3a

The idea is to perform the conversion in the [0x00000000, 0x7FFFFFFF] range as normal, but to do the conversion for [0x80000000, 0xFFFFFFFF] entirely manually in integers — this is simple because the resulting number can either have an exponent of 31, or be exactly 2^32.

First, true + 0x7F + parity addition is performed as usual in round-to-nearest-even, with the high bit of the original number initially unmodified. The result is either 0b1… in the general case, or somewhere near 0b0… if it was close to 2^32 and thus overflowed.

Next, to complete the rounding by dropping the excess bits, arithmetic right shift by 8 is done, sign-extending the upper 0x80000000 (for the majority of the range) so that not only the explicitly stored mantissa bits are placed in the low 23 bits of the float, but also -1 is written to the exponent bits and above them. However, if the number should be converted to 4294967296.0, the high bit will be zeroed — thus sign extension will write 0 in the exponent.

So now we have 0b1_11111111_mmmmmmmmmmmmmmmmmmmmmmm for numbers that need to have the exponent of 31 in the end, and 0b0_00000000_00000000000000000000000 for 2^32.

What we need to do now is to bias the exponent — add the bit representation of 2^32, so either 1 will be subtracted from it for 31, or 0 will be subtracted for 32.

The final operation is a simple selection between the [0x00000000, 0x7FFFFFFF] and the [0x80000000, 0xFFFFFFFF] range results — in AVX vblendvps selects based on the upper bit, in SSE the same can be done with arithmetic >>31 and and-andnot-or.

I don't have experience with LLVM code yet so I couldn't locate the JIT implementation of this (the UItoFP LLVM operation), maybe it's already correct there, but at least the interpreter seems to be broken — if needed, I can try making a pull request, though I may possibly need some hints on how to find things in LLVM lib/Target/x86 to do so. Also may be worth checking if SPUs are affected by this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions