Pre-submission Checklist
GPU Hardware
Intel Arc B580
DRI Devices Information
expand...
moritz@opencl-pc:~/Documents/FluidX3D$ ls -ls /dev/dri/*
0 crw-rw----+ 1 root video 226, 1 Apr 2 18:39 /dev/dri/card1
0 crw-rw----+ 1 root video 226, 2 Apr 2 2026 /dev/dri/card2
0 crw-rw----+ 1 root video 226, 3 Apr 2 18:44 /dev/dri/card3
0 crw-rw----+ 1 root video 226, 4 Apr 2 2026 /dev/dri/card4
0 crw-rw----+ 1 root render 226, 128 Apr 2 2026 /dev/dri/renderD128
0 crw-rw----+ 1 root render 226, 129 Apr 2 2026 /dev/dri/renderD129
0 crw-rw----+ 1 root render 226, 130 Apr 2 2026 /dev/dri/renderD130
0 crw-rw----+ 1 root render 226, 131 Apr 2 2026 /dev/dri/renderD131
/dev/dri/by-path:
total 0
0 lrwxrwxrwx 1 root root 8 Apr 2 2026 pci-0000:00:02.0-card -> ../card3
0 lrwxrwxrwx 1 root root 13 Apr 2 2026 pci-0000:00:02.0-render -> ../renderD130
0 lrwxrwxrwx 1 root root 8 Apr 2 2026 pci-0000:03:00.0-card -> ../card1
0 lrwxrwxrwx 1 root root 13 Apr 2 2026 pci-0000:03:00.0-render -> ../renderD128
0 lrwxrwxrwx 1 root root 8 Apr 2 2026 pci-0000:04:00.0-card -> ../card4
0 lrwxrwxrwx 1 root root 13 Apr 2 2026 pci-0000:04:00.0-render -> ../renderD131
0 lrwxrwxrwx 1 root root 8 Apr 2 2026 pci-0000:75:00.0-card -> ../card2
0 lrwxrwxrwx 1 root root 13 Apr 2 2026 pci-0000:75:00.0-render -> ../renderD129
GPU Detailed Information (lspci output)
expand...
moritz@opencl-pc:~/Documents/FluidX3D$ lspci
00:00.0 Host bridge: Intel Corporation Raptor Lake-S Host Bridge/DRAM Controller (rev 01)
00:01.0 PCI bridge: Intel Corporation Raptor Lake PCI Express 5.0 Graphics Port (PEG010) (rev 01)
00:01.1 PCI bridge: Intel Corporation Device a72d (rev 01)
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] (rev 04)
00:0a.0 Signal processing controller: Intel Corporation Raptor Lake Crashlog and Telemetry (rev 01)
00:0e.0 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller Intel Corporation
00:14.0 USB controller: Intel Corporation Raptor Lake USB 3.2 Gen 2x2 (20 Gb/s) XHCI Host Controller (rev 11)
00:14.2 RAM memory: Intel Corporation Raptor Lake-S PCH Shared SRAM (rev 11)
00:14.3 Network controller: Intel Corporation Raptor Lake-S PCH CNVi WiFi (rev 11)
00:15.0 Serial bus controller: Intel Corporation Raptor Lake Serial IO I2C Host Controller #0 (rev 11)
00:15.1 Serial bus controller: Intel Corporation Raptor Lake Serial IO I2C Host Controller #1 (rev 11)
00:15.2 Serial bus controller: Intel Corporation Raptor Lake Serial IO I2C Host Controller #2 (rev 11)
00:16.0 Communication controller: Intel Corporation Raptor Lake CSME HECI #1 (rev 11)
00:17.0 SATA controller: Intel Corporation Raptor Lake SATA AHCI Controller (rev 11)
00:1a.0 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #25 (rev 11)
00:1b.0 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #17 (rev 11)
00:1c.0 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #1 (rev 11)
00:1c.3 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #4 (rev 11)
00:1c.4 PCI bridge: Intel Corporation Device 7a3c (rev 11)
00:1d.0 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #9 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Raptor Lake LPC/eSPI Controller (rev 11)
00:1f.3 Audio device: Intel Corporation Raptor Lake High Definition Audio Controller (rev 11)
00:1f.4 SMBus: Intel Corporation Raptor Lake-S PCH SMBus Controller (rev 11)
00:1f.5 Serial bus controller: Intel Corporation Raptor Lake SPI (flash) Controller (rev 11)
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 11)
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 11)
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon RX 7700 XT / 7800 XT] (rev ff)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
04:00.0 VGA compatible controller: NVIDIA Corporation GP102 [TITAN Xp] (rev a1)
04:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
07:00.0 Ethernet controller: Aquantia Corp. AQtion AQC113CS NBase-T/IEEE 802.3an Ethernet Controller [Antigua 10G] (rev 03)
08:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-V (rev 06)
09:00.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
0a:00.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
0a:01.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
0a:02.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
0a:03.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
0b:00.0 USB controller: Intel Corporation Thunderbolt 4 NHI [Maple Ridge 4C 2020]
3f:00.0 USB controller: Intel Corporation Thunderbolt 4 USB Controller [Maple Ridge 4C 2020]
73:00.0 PCI bridge: Intel Corporation Device e2ff (rev 01)
74:01.0 PCI bridge: Intel Corporation Device e2f0
74:02.0 PCI bridge: Intel Corporation Device e2f1
75:00.0 VGA compatible controller: Intel Corporation Battlemage G21 [Arc B580]
76:00.0 Audio device: Intel Corporation Device e2f7
moritz@opencl-pc:~/Documents/FluidX3D$ sudo lspci -vvv -k -s 75:00.0
75:00.0 VGA compatible controller: Intel Corporation Battlemage G21 [Arc B580] (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 1100
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 202
IOMMU group: 35
Region 0: Memory at 76000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at 6800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at 77000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee00cb8 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [d0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [110 v1] Null
Capabilities: [200 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable-, Smallest Translation Unit: 00
Capabilities: [420 v1] Physical Resizable BAR
BAR 2: current size: 16GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB
Capabilities: [400 v1] Latency Tolerance Reporting
Max snoop latency: 3145728ns
Max no snoop latency: 3145728ns
Kernel driver in use: xe
Kernel modules: xe
moritz@opencl-pc:~/Documents/FluidX3D$ sudo lspci -vvv -k -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] (rev 04) (prog-if 00 [VGA controller])
DeviceName: Onboard IGD
Subsystem: ASUSTeK Computer Inc. Raptor Lake-S GT1 [UHD Graphics 770]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 219
IOMMU group: 0
Region 0: Memory at 6c76000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at 4000000000 (64-bit, prefetchable) [size=256M]
Region 4: I/O ports at 6000 [size=64]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+ FLReset+
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit-
Address: fee00018 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [d0] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Process Address Space ID (PASID)
PASIDCap: Exec- Priv-, Max PASID Width: 14
PASIDCtl: Enable- Exec- Priv-
Capabilities: [200 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable+, Smallest Translation Unit: 00
Capabilities: [300 v1] Page Request Interface (PRI)
PRICtl: Enable- Reset-
PRISta: RF- UPRGI- Stopped+
Page Request Capacity: 00008000, Page Request Allocation: 00000000
Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- 10BitTagReq-
IOVSta: Migration-
Initial VFs: 7, Total VFs: 7, Number of VFs: 0, Function Dependency Link: 00
VF offset: 1, stride: 1, Device ID: a780
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 0000004010000000 (64-bit, non-prefetchable)
Region 2: Memory at 0000004020000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Kernel driver in use: i915
Kernel modules: i915, xe
moritz@opencl-pc:~/Documents/FluidX3D$ sudo lspci -vvv -k -s 03:00.0
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon RX 7700 XT / 7800 XT] (rev ff) (prog-if 00 [VGA controller])
Subsystem: Sapphire Technology Limited Navi 32 [Radeon RX 7700 XT / 7800 XT]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 200
IOMMU group: 20
Region 0: Memory at 6000000000 (64-bit, prefetchable) [size=16G]
Region 2: Memory at 6400000000 (64-bit, prefetchable) [size=256M]
Region 4: I/O ports at 5000 [size=256]
Region 5: Memory at 7a300000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at 7a400000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <1us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s, Width x16
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
EmergencyPowerReduction Form Factor Dev Specific, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn+
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00c98 Data: 0000
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [200 v1] Physical Resizable BAR
BAR 0: current size: 16GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB
BAR 2: current size: 256MB, supported: 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB
Capabilities: [240 v1] Power Budgeting <?>
Capabilities: [270 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [2a0 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [2d0 v1] Process Address Space ID (PASID)
PASIDCap: Exec+ Priv+, Max PASID Width: 10
PASIDCtl: Enable- Exec- Priv-
Capabilities: [320 v1] Latency Tolerance Reporting
Max snoop latency: 1048576ns
Max no snoop latency: 1048576ns
Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [450 v1] Lane Margining at the Receiver <?>
Kernel driver in use: amdgpu
Kernel modules: amdgpu
moritz@opencl-pc:~/Documents/FluidX3D$ sudo lspci -vvv -k -s 04:00.0
04:00.0 VGA compatible controller: NVIDIA Corporation GP102 [TITAN Xp] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation GP102 [TITAN Xp]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 225
IOMMU group: 22
Region 0: Memory at 78000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 6c60000000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at 6c70000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at 4000 [size=128]
Expansion ROM at 79000000 [virtual] [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00f58 Data: 0000
Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Via message, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [250 v1] Latency Tolerance Reporting
Max snoop latency: 34326183936ns
Max no snoop latency: 34326183936ns
Capabilities: [128 v1] Power Budgeting <?>
Capabilities: [420 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
Driver Version
26.05.37020.3
Installed GPU Driver Packages
expand...
moritz@opencl-pc:~/Documents/FluidX3D$ sudo dpkg --list | grep -iE "igc|gmm|opencl|level-zero|fc|level_zero|ocloc|libze"
ii clinfo 3.0.23.01.25-1build1 amd64 Query OpenCL system information
ii fcitx5 5.1.7-1build3 amd64 Next generation of Fcitx Input Method Framework
ii fcitx5-chinese-addons 5.1.3-1build3 all Chinese-related addon for fcitx5 (metapackage)
ii fcitx5-chinese-addons-bin 5.1.3-1build3 amd64 Chinese-related addon for fcitx5 (binary tools)
ii fcitx5-chinese-addons-data 5.1.3-1build3 all Chinese-related addon for fcitx5 (shared data files)
ii fcitx5-config-qt 5.1.4-1 amd64 configuration tool for Fcitx5 (Qt version)
ii fcitx5-data 5.1.7-1build3 all Fcitx Input Method Framework v5 (common data files)
ii fcitx5-frontend-all 5.1.7-1build3 all Fcitx Input Method Framework v5 (IM Module Metapackage)
ii fcitx5-frontend-gtk3 5.1.1-1build2 amd64 GTK3 IM Module for fcitx5
ii fcitx5-frontend-gtk4 5.1.1-1build2 amd64 GTK4 IM Module for fcitx5
ii fcitx5-frontend-qt5 5.1.4-1build5 amd64 Qt5 IM module for fcitx5
ii fcitx5-frontend-qt6 5.1.4-1build5 amd64 Qt6 IM module for fcitx5
ii fcitx5-material-color 0.2.1-1 all UI theme for fcitx5 following Material Design
ii fcitx5-module-chttrans:amd64 5.1.3-1build3 amd64 Fcitx Input Method Framework v5 (chttrans module)
ii fcitx5-module-cloudpinyin:amd64 5.1.3-1build3 amd64 Fcitx Input Method Framework v5 (cloudpinyin module)
ii fcitx5-module-fullwidth:amd64 5.1.3-1build3 amd64 Fcitx Input Method Framework v5 (fullwidth module)
ii fcitx5-module-lua:amd64 5.0.12-1 amd64 Lua support for fcitx5
ii fcitx5-module-lua-common 5.0.12-1 all Lua support for fcitx5 (common files)
ii fcitx5-module-pinyinhelper:amd64 5.1.3-1build3 amd64 Fcitx Input Method Framework v5 (pinyinhelper module)
ii fcitx5-module-punctuation:amd64 5.1.3-1build3 amd64 Fcitx Input Method Framework v5 (punctuation module)
ii fcitx5-modules:amd64 5.1.7-1build3 amd64 Fcitx Input Method Framework v5 (core modules)
ii fcitx5-pinyin:amd64 5.1.3-1build3 amd64 Fcitx Input Method Framework v5 (builtin pinyin support)
ii fcitx5-pinyin-gui:amd64 5.1.3-1build3 amd64 Fcitx Input Method Framework v5 (builtin pinyin GUI tools)
ii fcitx5-table:amd64 5.1.3-1build3 amd64 Fcitx Input Method Framework v5 (builtin table support)
ii intel-igc-core-2 2.28.4 amd64 Intel(R) Graphics Compiler for OpenCL(TM)
ii intel-igc-opencl-2 2.28.4 amd64 Intel(R) Graphics Compiler for OpenCL(TM)
ii intel-ocloc 26.05.37020.3-0 amd64 Tool for managing Intel Compute GPU device binary format
ii intel-opencl-icd 26.05.37020.3-0 amd64 Intel graphics compute runtime for OpenCL
ii kde-config-fcitx5 5.1.4-1 amd64 KDE configuration module for Fcitx5
ii libcbor0.10:amd64 0.10.2-1.2ubuntu2 amd64 library for parsing and generating CBOR (RFC 7049)
ii libdebconfclient0:amd64 0.271ubuntu3 amd64 Debian Configuration Management System (C-implementation library)
ii libfcitx5-qt-data 5.1.4-1build5 all Qt library and IM module for fcitx5 (data files)
ii libfcitx5-qt1:amd64 5.1.4-1build5 amd64 Qt library and IM module for fcitx5
ii libfcitx5-qt6-1:amd64 5.1.4-1build5 amd64 Qt6 library and IM module for fcitx5
ii libfcitx5config6:amd64 5.1.7-1build3 amd64 Fcitx Input Method Framework v5 (config library)
ii libfcitx5core7:amd64 5.1.7-1build3 amd64 Fcitx Input Method Framework v5 (core library)
ii libfcitx5gclient2:amd64 5.1.1-1build2 amd64 GLib-based D-Bus client library for fcitx5 (library)
ii libfcitx5utils2:amd64 5.1.7-1build3 amd64 Fcitx Input Method Framework v5 (utils library)
ii libfile-fcntllock-perl 0.22-4ubuntu5 amd64 Perl module for file locking with fcntl(2)
ii libigdgmm12:amd64 22.9.0 amd64 Intel Graphics Memory Management Library -- shared library
ii libkf5configcore5:amd64 5.115.0-0ubuntu5 amd64 configuration settings framework for Qt
ii libkf5newstuffcore5:amd64 5.115.0-0ubuntu5 amd64 Support for downloading application assets from the network.
ii libsigc++-2.0-0v5:amd64 2.12.1-2 amd64 type-safe Signal Framework for C++ - runtime
ii liburiparser1:amd64 0.9.7+dfsg-2build1 amd64 URI parsing library compliant with RFC 3986
ii libze-intel-gpu1 26.05.37020.3-0 amd64 Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii ocl-icd-libopencl1:amd64 2.3.2-1build1 amd64 Generic OpenCL ICD Loader
ii ocl-icd-libopencl1:i386 2.3.2-1build1 i386 Generic OpenCL ICD Loader
ii ocl-icd-opencl-dev:amd64 2.3.2-1build1 amd64 OpenCL development files
ii opencl-c-headers 3.0~2023.12.14-1 all OpenCL (Open Computing Language) C header files
ii opencl-clhpp-headers 3.0~2023.12.14-1ubuntu1 all C++ headers for OpenCL development
ii pocl-opencl-icd:amd64 5.0-2.1build3 amd64 pocl ICD
ii python3-idna 3.6-2ubuntu0.1 all Python IDNA2008 (RFC 5891) handling (Python 3)
ii rocm-opencl 2.0.0.70200-43~24.04 amd64 clr built using CMake
ii rocm-opencl-dev 2.0.0.70200-43~24.04 amd64 clr built using CMake
ii rocm-opencl-runtime 7.2.0.70200-43~24.04 amd64 Radeon Open Compute (ROCm) Runtime software stack
ii rocm-opencl-sdk 7.2.0.70200-43~24.04 amd64 Radeon Open Compute (ROCm) Runtime software stack
ii rpp 2.2.0.70200-43~24.04 amd64 Computer vision library for AMD CPUs and GPUs with HOST/HIP/OpenCL back-ends
ii rpp-dev 2.2.0.70200-43~24.04 amd64 Computer vision library for AMD CPUs and GPUs with HOST/HIP/OpenCL back-ends
Driver Installation Details
Vanilla Ubuntu 24.04.4 LTS, with OpenCL runtime installation following https://github.com/ProjectPhysX/FluidX3D/blob/master/DOCUMENTATION.md#0-install-gpu-drivers-and-opencl-runtime
Linux Distribution
Ubuntu 24.04.4 LTS
Kernel Version & Boot Parameters
moritz@opencl-pc:~/Documents/FluidX3D$ uname -r
6.17.0-19-generic
Actual Behavior
Running the FluidX3D multi-GPU benchmark with 2 GPUs hangs at initialization, at first OpenCL kernel execution, when one of the GPUs is Intel Arc B580.
Any other combination of GPUs, i.e. Nvidia Titan Xp + Intel UHD 770, or AMD RX 7700 XT + Nvidia Titan Xp, works without issues.
I tracked the hang down to execution of 2 small OpenCL kernels: transfer_extract_rho_u_flags and transfer__insert_rho_u_flags. Here is the entire relevant code section:
#define uxx uint
uxx index(const uint3 xyz) { // assemble 1D index from 3D coordinates (x,y,z -> n)
return (uxx)xyz.x+(uxx)(xyz.y+xyz.z*def_Ny)*(uxx)def_Nx; // n = x+(y+z*Ny)*Nx
}
uint get_area(const uint direction) {
const uint A[3] = { def_Ax, def_Ay, def_Az };
return A[direction];
}
uxx index_extract_p(const uint a, const uint direction) {
const uint3 coordinates[3] = { (uint3)(def_Nx-2u, a%def_Ny, a/def_Ny), (uint3)(a/def_Nz, def_Ny-2u, a%def_Nz), (uint3)(a%def_Nx, a/def_Nx, def_Nz-2u) };
return index(coordinates[direction]);
}
uxx index_extract_m(const uint a, const uint direction) {
const uint3 coordinates[3] = { (uint3)( 1u, a%def_Ny, a/def_Ny), (uint3)(a/def_Nz, 1u, a%def_Nz), (uint3)(a%def_Nx, a/def_Nx, 1u) };
return index(coordinates[direction]);
}
uxx index_insert_p(const uint a, const uint direction) {
const uint3 coordinates[3] = { (uint3)(def_Nx-1u, a%def_Ny, a/def_Ny), (uint3)(a/def_Nz, def_Ny-1u, a%def_Nz), (uint3)(a%def_Nx, a/def_Nx, def_Nz-1u) };
return index(coordinates[direction]);
}
uxx index_insert_m(const uint a, const uint direction) {
const uint3 coordinates[3] = { (uint3)( 0u, a%def_Ny, a/def_Ny), (uint3)(a/def_Nz, 0u, a%def_Nz), (uint3)(a%def_Nx, a/def_Nx, 0u) };
return index(coordinates[direction]);
}
void extract_rho_u_flags(const uint a, const uint A, const uxx n, global char* transfer_buffer, const global float* rho, const global float* u, const global uchar* flags) {
((global float*)transfer_buffer)[ a] = rho[ n];
((global float*)transfer_buffer)[ A+a] = u[ n];
((global float*)transfer_buffer)[ 2u*A+a] = u[ def_N+(ulong)n];
((global float*)transfer_buffer)[ 3u*A+a] = u[2ul*def_N+(ulong)n];
((global uchar*)transfer_buffer)[16u*A+a] = flags[ n]; // <-- this causes the hang on Intel arc B580!
}
void insert_rho_u_flags(const uint a, const uint A, const uxx n, const global char* transfer_buffer, global float* rho, global float* u, global uchar* flags) {
rho[ n] = ((const global float*)transfer_buffer)[ a];
u[ n] = ((const global float*)transfer_buffer)[ A+a];
u[ def_N+(ulong)n] = ((const global float*)transfer_buffer)[ 2u*A+a];
u[2ul*def_N+(ulong)n] = ((const global float*)transfer_buffer)[ 3u*A+a];
flags[ n] = ((const global uchar*)transfer_buffer)[16u*A+a]; // <-- this causes extreme slowdown on Intel arc B580!
}
kernel void transfer_extract_rho_u_flags(const uint direction, const ulong t, global char* transfer_buffer_p, global char* transfer_buffer_m, const global float* rho, const global float* u, const global uchar* flags) {
const uint a=get_global_id(0), A=get_area(direction); // a = domain area index for each side, A = area of the domain boundary
if(a>=A) return; // area might not be a multiple of cl_workgroup_size, so return here to avoid writing in unallocated memory space
extract_rho_u_flags(a, A, index_extract_p(a, direction), transfer_buffer_p, rho, u, flags);
extract_rho_u_flags(a, A, index_extract_m(a, direction), transfer_buffer_m, rho, u, flags);
}
kernel void transfer__insert_rho_u_flags(const uint direction, const ulong t, const global char* transfer_buffer_p, const global char* transfer_buffer_m, global float* rho, global float* u, global uchar* flags) {
const uint a=get_global_id(0), A=get_area(direction); // a = domain area index for each side, A = area of the domain boundary
if(a>=A) return; // area might not be a multiple of cl_workgroup_size, so return here to avoid writing in unallocated memory space
insert_rho_u_flags(a, A, index_insert_p(a, direction), transfer_buffer_p, rho, u, flags);
insert_rho_u_flags(a, A, index_insert_m(a, direction), transfer_buffer_m, rho, u, flags);
}
The code lines responsible for the hang are:
((global uchar*)transfer_buffer)[16u*A+a] = flags[ n]; // <-- this causes the hang on Intel arc B580!
flags[ n] = ((const global uchar*)transfer_buffer)[16u*A+a]; // <-- this causes extreme slowdown on Intel arc B580!
The issue is not the casting global char* transfer_buffer to (global uchar*)transfer_buffer or casting const global char* transfer_buffer to (const global uchar*)transfer_buffer, but the memory load/store in 8-bit integer itself. The store causes hang, the load causes extreme slowdown in all following kernel executions. This is regression in Intel GPU driver. Please fix!
Expected Behavior
Expected is that FluidX3D for any combination of 2 or more GPUs will successfully initialize and complete the benchmark without hang or slowdown.
Reproduction Rate
Always reproduces - 100%
Steps to Reproduce
You need a system with an Intel Arc B580 GPU, Ubuntu 24.04.4 LTS, and Intel OpenCL runtime installed according to here and here.
Clone FluidX3D;
git clone https://github.com/ProjectPhysX/FluidX3D.git && cd FluidX3D
chmod +x make.sh
Modify src/setup.cpp:
- comment out here:
//LBM lbm(256u, 256u, 256u, 1.0f); // default
- uncomment here:
const uint memory = 1488u; // memory occupation in MB (for multi-GPU benchmarks: make this close to as large as the GPU's VRAM capacity)
const uint3 lbm_N = (resolution(float3(1.0f, 1.0f, 1.0f), memory)/4u)*4u; // input: simulation box aspect ratio and VRAM occupation in MB, output: grid resolution
//LBM lbm(1u*lbm_N.x, 1u*lbm_N.y, 1u*lbm_N.z, 1u, 1u, 1u, 1.0f); // 1 GPU
LBM lbm(2u*lbm_N.x, 1u*lbm_N.y, 1u*lbm_N.z, 2u, 1u, 1u, 1.0f); // 2 GPUs
Compile and run on Intel Arc B580 + Intel Arc B580 (yes, you can select the same GPU ID twice). Check that the OpenCL device ID (here 0) for Intel Arc B580 is selected twice:
You will then see a hang here:
moritz@opencl-pc:~/Documents/FluidX3D$ ./make.sh 0 0
Info: Detected Operating System: Linux
Info: Compiling with 24 CPU cores.
g++ -c src/kernel.cpp -o temp/kernel.o -std=c++17 -pthread -O -Wno-comment
g++ temp/*.o -o bin/FluidX3D -std=c++17 -pthread -O -Wno-comment -I./src/OpenCL/include -L./src/OpenCL/lib -lOpenCL
.-----------------------------------------------------------------------------.
| ______________ ______________ |
| \ ________ | | ________ / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ \_.-" | | "-._/ / |
| \ _.-" _ "-._ / |
| \.-" _.-" "-._ "-./ |
| .-" .-"-. "-. |
| \ v" "v / |
| \ \ / / |
| \ \ / / |
| \ \ / / |
| \ ' / |
| \ / |
| \ / FluidX3D Version 3.6 |
| ' Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID 0 | Intel(R) Arc(TM) B580 Graphics |
| Device ID 1 | Intel(R) UHD Graphics 770 |
| Device ID 2 | NVIDIA TITAN Xp |
| Device ID 3 | 13th Gen Intel(R) Core(TM) i7-13700K |
| Device ID 4 | AMD Radeon RX 7700 XT |
| Device ID 5 | cpu-haswell-13th Gen Intel(R) Core(TM) i7-13700K |
|----------------'------------------------------------------------------------|
| Info: Creating domains |
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | Intel(R) Arc(TM) B580 Graphics |
| Device Vendor | Intel(R) Corporation |
| Device Driver | 26.05.37020.3 (Linux) |
| OpenCL Version | OpenCL C 3.0 |
| Compute Units | 160 at 2850 MHz (2560 cores, 14.592 TFLOPs/s) |
| Memory, Cache | 12215 MB VRAM, 18432 KB global / 128 KB local |
| Buffer Limits | 11605 MB global, 11883724 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| Info: Allocating memory. This may take a few seconds. |
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | Intel(R) Arc(TM) B580 Graphics |
| Device Vendor | Intel(R) Corporation |
| Device Driver | 26.05.37020.3 (Linux) |
| OpenCL Version | OpenCL C 3.0 |
| Compute Units | 160 at 2850 MHz (2560 cores, 14.592 TFLOPs/s) |
| Memory, Cache | 12215 MB VRAM, 18432 KB global / 128 KB local |
| Buffer Limits | 11605 MB global, 11883724 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| Info: Allocating memory. This may take a few seconds. |
Without hang, output would continue:
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 528 x 264 x 264 = 36799488 |
| Grid Domains | 2 x 1 x 1 = 2 |
| LBM Type | D3Q19 SRT (FP32/FP16S) |
| Memory Usage | CPU 596 MB, GPU 2x 972 MB |
| Max Alloc Size | 666 MB |
| Time Steps | 500 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 373 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 4360 | 336 GB/s | 118 | 497 99% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 4414 |
moritz@opencl-pc:~/Documents/FluidX3D$
Is this a regression?
Last Known Working Driver Version
25.18.33578.6
First Known Failing Driver Version
26.05.37020.3
Source Code / Reproducer
https://github.com/ProjectPhysX/FluidX3D
Pre-submission Checklist
GPU Hardware
Intel Arc B580
DRI Devices Information
expand...
GPU Detailed Information (lspci output)
expand...
Driver Version
26.05.37020.3
Installed GPU Driver Packages
expand...
Driver Installation Details
Vanilla Ubuntu 24.04.4 LTS, with OpenCL runtime installation following https://github.com/ProjectPhysX/FluidX3D/blob/master/DOCUMENTATION.md#0-install-gpu-drivers-and-opencl-runtime
Linux Distribution
Ubuntu 24.04.4 LTS
Kernel Version & Boot Parameters
moritz@opencl-pc:~/Documents/FluidX3D$ uname -r 6.17.0-19-genericActual Behavior
Running the FluidX3D multi-GPU benchmark with 2 GPUs hangs at initialization, at first OpenCL kernel execution, when one of the GPUs is Intel Arc B580.
Any other combination of GPUs, i.e. Nvidia Titan Xp + Intel UHD 770, or AMD RX 7700 XT + Nvidia Titan Xp, works without issues.
I tracked the hang down to execution of 2 small OpenCL kernels:
transfer_extract_rho_u_flagsandtransfer__insert_rho_u_flags. Here is the entire relevant code section:The code lines responsible for the hang are:
The issue is not the casting
global char* transfer_bufferto(global uchar*)transfer_bufferor castingconst global char* transfer_bufferto(const global uchar*)transfer_buffer, but the memory load/store in 8-bit integer itself. The store causes hang, the load causes extreme slowdown in all following kernel executions. This is regression in Intel GPU driver. Please fix!Expected Behavior
Expected is that FluidX3D for any combination of 2 or more GPUs will successfully initialize and complete the benchmark without hang or slowdown.
Reproduction Rate
Always reproduces - 100%
Steps to Reproduce
You need a system with an Intel Arc B580 GPU, Ubuntu 24.04.4 LTS, and Intel OpenCL runtime installed according to here and here.
Clone FluidX3D;
Modify
src/setup.cpp://LBM lbm(256u, 256u, 256u, 1.0f); // defaultCompile and run on Intel Arc B580 + Intel Arc B580 (yes, you can select the same GPU ID twice). Check that the OpenCL device ID (here
0) for Intel Arc B580 is selected twice:You will then see a hang here:
Without hang, output would continue:
Is this a regression?
Last Known Working Driver Version
25.18.33578.6
First Known Failing Driver Version
26.05.37020.3
Source Code / Reproducer
https://github.com/ProjectPhysX/FluidX3D