Skip to content

Conversation

@pmatos
Copy link
Collaborator

@pmatos pmatos commented Nov 5, 2025

Reimplementation of quiet and signalling nan propagation on x87

Initially implemented in #4835, later reverted in #4981.

Depends on #5011 .

Copy link
Collaborator

@bylaws bylaws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm missing something but I would expect the strict option here to control the silencenan code emission in storestackmem, FromF64_PerserveNan is definitely cheap enough as to not require an option at least. Perhaps it would be nicer to make reduced precision and enum rather than adding a new option

@pmatos pmatos force-pushed the fix/silenceNaN-upstream branch from c24486e to 7cd789c Compare November 5, 2025 14:46
@bylaws
Copy link
Collaborator

bylaws commented Nov 5, 2025

Whats the instcountci blow up if this is enabled btw? I'm curious

@pmatos pmatos force-pushed the fix/silenceNaN-upstream branch from 7cd789c to 8db894f Compare November 5, 2025 15:10
@pmatos
Copy link
Collaborator Author

pmatos commented Nov 17, 2025

Perhaps I'm missing something but I would expect the strict option here to control the silencenan code emission in storestackmem, FromF64_PerserveNan is definitely cheap enough as to not require an option at least. Perhaps it would be nicer to make reduced precision and enum rather than adding a new option

I understand that from a performance perspective we could do the check in FromF64_PerserveNan but it does feel weird to me to do the check sometimes but not others. IMO, it makes more sense to have all the checks behind the same flag.

The downside of making reduced precision an enum is that this is not backwards compatible and will require everyone to update their config files. I think it was initially @Sonicadvance1 who suggested a stricter version of reduced precision in my initial pr. Wonder what he would think of making reduced precision and strenum. :)

@pmatos
Copy link
Collaborator Author

pmatos commented Nov 17, 2025

Whats the instcountci blow up if this is enabled btw? I'm curious

I will get these for you today.

@pmatos pmatos force-pushed the fix/silenceNaN-upstream branch 3 times, most recently from a30769d to 530c77a Compare November 17, 2025 14:09
@bylaws
Copy link
Collaborator

bylaws commented Nov 17, 2025

I was researching if theres a better way to do this, and came up with below ASM:

   movi    v31.4s, #0x40 
    shl     v31.4s, v31.4s, #22
    fcmeq   v1.4s, v0.4s, v0.4s
    orr     v2.8b, v0.8b, v31.8b
    bif     v0.8b, v2.8b, v1.8b     

@Sonicadvance1 think we could just enable by default given this? given in most cases it's effectively 3 insts, could get to two insts with sve (albeit same per mca since fcmuo has much lower throughput)

@pmatos pmatos force-pushed the fix/silenceNaN-upstream branch from 530c77a to eb78617 Compare November 18, 2025 10:27
@pmatos
Copy link
Collaborator Author

pmatos commented Nov 24, 2025

I was researching if theres a better way to do this, and came up with below ASM:

   movi    v31.4s, #0x40 
    shl     v31.4s, v31.4s, #22
    fcmeq   v1.4s, v0.4s, v0.4s
    orr     v2.8b, v0.8b, v31.8b
    bif     v0.8b, v2.8b, v1.8b     

@Sonicadvance1 think we could just enable by default given this? given in most cases it's effectively 3 insts, could get to two insts with sve (albeit same per mca since fcmuo has much lower throughput)

Nice - I took a stab at trying that out.

@pmatos pmatos force-pushed the fix/silenceNaN-upstream branch from eb78617 to 608701e Compare November 24, 2025 20:23
@pmatos
Copy link
Collaborator Author

pmatos commented Nov 24, 2025

@bylaws that was great. take a look. :)

}
}

void UpdateX87PrecisionConfig() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function isn't correct. This'll tear the config across what FEX_CONFIG_OPT has already been loaded versus not. This should be moved to FEXCore/Source/Interface/Config/Config.cpp in ReloadMetaLayer towards the end. You can see how single stepping changes the MaxInst config and you could do something similar for this there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will take a look. regarding @bylaws comment, should we then keep the strict mode even with the reduced performance impact?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get some perf numbers from the x87 heavy games? I can see Psychonauts multiblock things have doubled in size.

@bylaws
Copy link
Collaborator

bylaws commented Nov 24, 2025

You'll want to pool the casted constant through RA, rather than materializing it every time before benchmarking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants