Skip to content

[GSD-12575] OpenCL kernel execution hangs in FluidX3D multi-GPU when Intel Arc B580 is one of the GPUs - (global char*) memory load/store causes kernel hang! #912

@ProjectPhysX

Description

@ProjectPhysX

Pre-submission Checklist

  • I am using the latest GPU driver version (releases)
  • I have searched for similar issues and found none

GPU Hardware

Intel Arc B580

DRI Devices Information

expand...
moritz@opencl-pc:~/Documents/FluidX3D$ ls -ls /dev/dri/*
0 crw-rw----+ 1 root video  226,   1 Apr  2 18:39 /dev/dri/card1
0 crw-rw----+ 1 root video  226,   2 Apr  2  2026 /dev/dri/card2
0 crw-rw----+ 1 root video  226,   3 Apr  2 18:44 /dev/dri/card3
0 crw-rw----+ 1 root video  226,   4 Apr  2  2026 /dev/dri/card4
0 crw-rw----+ 1 root render 226, 128 Apr  2  2026 /dev/dri/renderD128
0 crw-rw----+ 1 root render 226, 129 Apr  2  2026 /dev/dri/renderD129
0 crw-rw----+ 1 root render 226, 130 Apr  2  2026 /dev/dri/renderD130
0 crw-rw----+ 1 root render 226, 131 Apr  2  2026 /dev/dri/renderD131

/dev/dri/by-path:
total 0
0 lrwxrwxrwx 1 root root  8 Apr  2  2026 pci-0000:00:02.0-card -> ../card3
0 lrwxrwxrwx 1 root root 13 Apr  2  2026 pci-0000:00:02.0-render -> ../renderD130
0 lrwxrwxrwx 1 root root  8 Apr  2  2026 pci-0000:03:00.0-card -> ../card1
0 lrwxrwxrwx 1 root root 13 Apr  2  2026 pci-0000:03:00.0-render -> ../renderD128
0 lrwxrwxrwx 1 root root  8 Apr  2  2026 pci-0000:04:00.0-card -> ../card4
0 lrwxrwxrwx 1 root root 13 Apr  2  2026 pci-0000:04:00.0-render -> ../renderD131
0 lrwxrwxrwx 1 root root  8 Apr  2  2026 pci-0000:75:00.0-card -> ../card2
0 lrwxrwxrwx 1 root root 13 Apr  2  2026 pci-0000:75:00.0-render -> ../renderD129

GPU Detailed Information (lspci output)

expand...
moritz@opencl-pc:~/Documents/FluidX3D$ lspci
00:00.0 Host bridge: Intel Corporation Raptor Lake-S Host Bridge/DRAM Controller (rev 01)
00:01.0 PCI bridge: Intel Corporation Raptor Lake PCI Express 5.0 Graphics Port (PEG010) (rev 01)
00:01.1 PCI bridge: Intel Corporation Device a72d (rev 01)
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] (rev 04)
00:0a.0 Signal processing controller: Intel Corporation Raptor Lake Crashlog and Telemetry (rev 01)
00:0e.0 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller Intel Corporation
00:14.0 USB controller: Intel Corporation Raptor Lake USB 3.2 Gen 2x2 (20 Gb/s) XHCI Host Controller (rev 11)
00:14.2 RAM memory: Intel Corporation Raptor Lake-S PCH Shared SRAM (rev 11)
00:14.3 Network controller: Intel Corporation Raptor Lake-S PCH CNVi WiFi (rev 11)
00:15.0 Serial bus controller: Intel Corporation Raptor Lake Serial IO I2C Host Controller #0 (rev 11)
00:15.1 Serial bus controller: Intel Corporation Raptor Lake Serial IO I2C Host Controller #1 (rev 11)
00:15.2 Serial bus controller: Intel Corporation Raptor Lake Serial IO I2C Host Controller #2 (rev 11)
00:16.0 Communication controller: Intel Corporation Raptor Lake CSME HECI #1 (rev 11)
00:17.0 SATA controller: Intel Corporation Raptor Lake SATA AHCI Controller (rev 11)
00:1a.0 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #25 (rev 11)
00:1b.0 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #17 (rev 11)
00:1c.0 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #1 (rev 11)
00:1c.3 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #4 (rev 11)
00:1c.4 PCI bridge: Intel Corporation Device 7a3c (rev 11)
00:1d.0 PCI bridge: Intel Corporation Raptor Lake PCI Express Root Port #9 (rev 11)
00:1f.0 ISA bridge: Intel Corporation Raptor Lake LPC/eSPI Controller (rev 11)
00:1f.3 Audio device: Intel Corporation Raptor Lake High Definition Audio Controller (rev 11)
00:1f.4 SMBus: Intel Corporation Raptor Lake-S PCH SMBus Controller (rev 11)
00:1f.5 Serial bus controller: Intel Corporation Raptor Lake SPI (flash) Controller (rev 11)
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 11)
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 11)
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon RX 7700 XT / 7800 XT] (rev ff)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
04:00.0 VGA compatible controller: NVIDIA Corporation GP102 [TITAN Xp] (rev a1)
04:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
07:00.0 Ethernet controller: Aquantia Corp. AQtion AQC113CS NBase-T/IEEE 802.3an Ethernet Controller [Antigua 10G] (rev 03)
08:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-V (rev 06)
09:00.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
0a:00.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
0a:01.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
0a:02.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
0a:03.0 PCI bridge: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] (rev 02)
0b:00.0 USB controller: Intel Corporation Thunderbolt 4 NHI [Maple Ridge 4C 2020]
3f:00.0 USB controller: Intel Corporation Thunderbolt 4 USB Controller [Maple Ridge 4C 2020]
73:00.0 PCI bridge: Intel Corporation Device e2ff (rev 01)
74:01.0 PCI bridge: Intel Corporation Device e2f0
74:02.0 PCI bridge: Intel Corporation Device e2f1
75:00.0 VGA compatible controller: Intel Corporation Battlemage G21 [Arc B580]
76:00.0 Audio device: Intel Corporation Device e2f7
moritz@opencl-pc:~/Documents/FluidX3D$ sudo lspci -vvv -k -s 75:00.0
75:00.0 VGA compatible controller: Intel Corporation Battlemage G21 [Arc B580] (prog-if 00 [VGA controller])
        Subsystem: Intel Corporation Device 1100
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin ? routed to IRQ 202
        IOMMU group: 35
        Region 0: Memory at 76000000 (64-bit, non-prefetchable) [size=16M]
        Region 2: Memory at 6800000000 (64-bit, prefetchable) [size=16G]
        Expansion ROM at 77000000 [disabled] [size=2M]
        Capabilities: [40] Vendor Specific Information: Len=0c <?>
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1
                        TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
                Address: 00000000fee00cb8  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [d0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [110 v1] Null
        Capabilities: [200 v1] Address Translation Service (ATS)
                ATSCap: Invalidate Queue Depth: 00
                ATSCtl: Enable-, Smallest Translation Unit: 00
        Capabilities: [420 v1] Physical Resizable BAR
                BAR 2: current size: 16GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB
        Capabilities: [400 v1] Latency Tolerance Reporting
                Max snoop latency: 3145728ns
                Max no snoop latency: 3145728ns
        Kernel driver in use: xe
        Kernel modules: xe

moritz@opencl-pc:~/Documents/FluidX3D$ sudo lspci -vvv -k -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-S GT1 [UHD Graphics 770] (rev 04) (prog-if 00 [VGA controller])
        DeviceName: Onboard IGD
        Subsystem: ASUSTeK Computer Inc. Raptor Lake-S GT1 [UHD Graphics 770]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 219
        IOMMU group: 0
        Region 0: Memory at 6c76000000 (64-bit, non-prefetchable) [size=16M]
        Region 2: Memory at 4000000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at 6000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [40] Vendor Specific Information: Len=0c <?>
        Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0
                        ExtTag- RBE+ FLReset+
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
        Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit-
                Address: fee00018  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [d0] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v1] Process Address Space ID (PASID)
                PASIDCap: Exec- Priv-, Max PASID Width: 14
                PASIDCtl: Enable- Exec- Priv-
        Capabilities: [200 v1] Address Translation Service (ATS)
                ATSCap: Invalidate Queue Depth: 00
                ATSCtl: Enable+, Smallest Translation Unit: 00
        Capabilities: [300 v1] Page Request Interface (PRI)
                PRICtl: Enable- Reset-
                PRISta: RF- UPRGI- Stopped+
                Page Request Capacity: 00008000, Page Request Allocation: 00000000
        Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration- 10BitTagReq- Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy- 10BitTagReq-
                IOVSta: Migration-
                Initial VFs: 7, Total VFs: 7, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 1, stride: 1, Device ID: a780
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 0000004010000000 (64-bit, non-prefetchable)
                Region 2: Memory at 0000004020000000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Kernel driver in use: i915
        Kernel modules: i915, xe

moritz@opencl-pc:~/Documents/FluidX3D$ sudo lspci -vvv -k -s 03:00.0
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon RX 7700 XT / 7800 XT] (rev ff) (prog-if 00 [VGA controller])
        Subsystem: Sapphire Technology Limited Navi 32 [Radeon RX 7700 XT / 7800 XT]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 200
        IOMMU group: 20
        Region 0: Memory at 6000000000 (64-bit, prefetchable) [size=16G]
        Region 2: Memory at 6400000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at 5000 [size=256]
        Region 5: Memory at 7a300000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at 7a400000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L1, Exit Latency L1 <1us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 16GT/s, Width x16
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
                         EmergencyPowerReduction Form Factor Dev Specific, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn+
                LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
                LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00c98  Data: 0000
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [200 v1] Physical Resizable BAR
                BAR 0: current size: 16GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB
                BAR 2: current size: 256MB, supported: 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB
        Capabilities: [240 v1] Power Budgeting <?>
        Capabilities: [270 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [2a0 v1] Access Control Services
                ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [2d0 v1] Process Address Space ID (PASID)
                PASIDCap: Exec+ Priv+, Max PASID Width: 10
                PASIDCtl: Enable- Exec- Priv-
        Capabilities: [320 v1] Latency Tolerance Reporting
                Max snoop latency: 1048576ns
                Max no snoop latency: 1048576ns
        Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
        Capabilities: [450 v1] Lane Margining at the Receiver <?>
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu

moritz@opencl-pc:~/Documents/FluidX3D$ sudo lspci -vvv -k -s 04:00.0
04:00.0 VGA compatible controller: NVIDIA Corporation GP102 [TITAN Xp] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: NVIDIA Corporation GP102 [TITAN Xp]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 225
        IOMMU group: 22
        Region 0: Memory at 78000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at 6c60000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at 6c70000000 (64-bit, prefetchable) [size=32M]
        Region 5: I/O ports at 4000 [size=128]
        Expansion ROM at 79000000 [virtual] [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00f58  Data: 0000
        Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x8 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp- 10BitTagReq- OBFF Via message, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
                         EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [250 v1] Latency Tolerance Reporting
                Max snoop latency: 34326183936ns
                Max no snoop latency: 34326183936ns
        Capabilities: [128 v1] Power Budgeting <?>
        Capabilities: [420 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Driver Version

26.05.37020.3

Installed GPU Driver Packages

expand...
moritz@opencl-pc:~/Documents/FluidX3D$ sudo dpkg --list | grep -iE "igc|gmm|opencl|level-zero|fc|level_zero|ocloc|libze"
ii  clinfo                                         3.0.23.01.25-1build1                             amd64        Query OpenCL system information
ii  fcitx5                                         5.1.7-1build3                                    amd64        Next generation of Fcitx Input Method Framework
ii  fcitx5-chinese-addons                          5.1.3-1build3                                    all          Chinese-related addon for fcitx5 (metapackage)
ii  fcitx5-chinese-addons-bin                      5.1.3-1build3                                    amd64        Chinese-related addon for fcitx5 (binary tools)
ii  fcitx5-chinese-addons-data                     5.1.3-1build3                                    all          Chinese-related addon for fcitx5 (shared data files)
ii  fcitx5-config-qt                               5.1.4-1                                          amd64        configuration tool for Fcitx5 (Qt version)
ii  fcitx5-data                                    5.1.7-1build3                                    all          Fcitx Input Method Framework v5 (common data files)
ii  fcitx5-frontend-all                            5.1.7-1build3                                    all          Fcitx Input Method Framework v5 (IM Module Metapackage)
ii  fcitx5-frontend-gtk3                           5.1.1-1build2                                    amd64        GTK3 IM Module for fcitx5
ii  fcitx5-frontend-gtk4                           5.1.1-1build2                                    amd64        GTK4 IM Module for fcitx5
ii  fcitx5-frontend-qt5                            5.1.4-1build5                                    amd64        Qt5 IM module for fcitx5
ii  fcitx5-frontend-qt6                            5.1.4-1build5                                    amd64        Qt6 IM module for fcitx5
ii  fcitx5-material-color                          0.2.1-1                                          all          UI theme for fcitx5 following Material Design
ii  fcitx5-module-chttrans:amd64                   5.1.3-1build3                                    amd64        Fcitx Input Method Framework v5 (chttrans module)
ii  fcitx5-module-cloudpinyin:amd64                5.1.3-1build3                                    amd64        Fcitx Input Method Framework v5 (cloudpinyin module)
ii  fcitx5-module-fullwidth:amd64                  5.1.3-1build3                                    amd64        Fcitx Input Method Framework v5 (fullwidth module)
ii  fcitx5-module-lua:amd64                        5.0.12-1                                         amd64        Lua support for fcitx5
ii  fcitx5-module-lua-common                       5.0.12-1                                         all          Lua support for fcitx5 (common files)
ii  fcitx5-module-pinyinhelper:amd64               5.1.3-1build3                                    amd64        Fcitx Input Method Framework v5 (pinyinhelper module)
ii  fcitx5-module-punctuation:amd64                5.1.3-1build3                                    amd64        Fcitx Input Method Framework v5 (punctuation module)
ii  fcitx5-modules:amd64                           5.1.7-1build3                                    amd64        Fcitx Input Method Framework v5 (core modules)
ii  fcitx5-pinyin:amd64                            5.1.3-1build3                                    amd64        Fcitx Input Method Framework v5 (builtin pinyin support)
ii  fcitx5-pinyin-gui:amd64                        5.1.3-1build3                                    amd64        Fcitx Input Method Framework v5 (builtin pinyin GUI tools)
ii  fcitx5-table:amd64                             5.1.3-1build3                                    amd64        Fcitx Input Method Framework v5 (builtin table support)
ii  intel-igc-core-2                               2.28.4                                           amd64        Intel(R) Graphics Compiler for OpenCL(TM)
ii  intel-igc-opencl-2                             2.28.4                                           amd64        Intel(R) Graphics Compiler for OpenCL(TM)
ii  intel-ocloc                                    26.05.37020.3-0                                  amd64        Tool for managing Intel Compute GPU device binary format
ii  intel-opencl-icd                               26.05.37020.3-0                                  amd64        Intel graphics compute runtime for OpenCL
ii  kde-config-fcitx5                              5.1.4-1                                          amd64        KDE configuration module for Fcitx5
ii  libcbor0.10:amd64                              0.10.2-1.2ubuntu2                                amd64        library for parsing and generating CBOR (RFC 7049)
ii  libdebconfclient0:amd64                        0.271ubuntu3                                     amd64        Debian Configuration Management System (C-implementation library)
ii  libfcitx5-qt-data                              5.1.4-1build5                                    all          Qt library and IM module for fcitx5 (data files)
ii  libfcitx5-qt1:amd64                            5.1.4-1build5                                    amd64        Qt library and IM module for fcitx5
ii  libfcitx5-qt6-1:amd64                          5.1.4-1build5                                    amd64        Qt6 library and IM module for fcitx5
ii  libfcitx5config6:amd64                         5.1.7-1build3                                    amd64        Fcitx Input Method Framework v5 (config library)
ii  libfcitx5core7:amd64                           5.1.7-1build3                                    amd64        Fcitx Input Method Framework v5 (core library)
ii  libfcitx5gclient2:amd64                        5.1.1-1build2                                    amd64        GLib-based D-Bus client library for fcitx5 (library)
ii  libfcitx5utils2:amd64                          5.1.7-1build3                                    amd64        Fcitx Input Method Framework v5 (utils library)
ii  libfile-fcntllock-perl                         0.22-4ubuntu5                                    amd64        Perl module for file locking with fcntl(2)
ii  libigdgmm12:amd64                              22.9.0                                           amd64        Intel Graphics Memory Management Library -- shared library
ii  libkf5configcore5:amd64                        5.115.0-0ubuntu5                                 amd64        configuration settings framework for Qt
ii  libkf5newstuffcore5:amd64                      5.115.0-0ubuntu5                                 amd64        Support for downloading application assets from the network.
ii  libsigc++-2.0-0v5:amd64                        2.12.1-2                                         amd64        type-safe Signal Framework for C++ - runtime
ii  liburiparser1:amd64                            0.9.7+dfsg-2build1                               amd64        URI parsing library compliant with RFC 3986
ii  libze-intel-gpu1                               26.05.37020.3-0                                  amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  ocl-icd-libopencl1:amd64                       2.3.2-1build1                                    amd64        Generic OpenCL ICD Loader
ii  ocl-icd-libopencl1:i386                        2.3.2-1build1                                    i386         Generic OpenCL ICD Loader
ii  ocl-icd-opencl-dev:amd64                       2.3.2-1build1                                    amd64        OpenCL development files
ii  opencl-c-headers                               3.0~2023.12.14-1                                 all          OpenCL (Open Computing Language) C header files
ii  opencl-clhpp-headers                           3.0~2023.12.14-1ubuntu1                          all          C++ headers for OpenCL development
ii  pocl-opencl-icd:amd64                          5.0-2.1build3                                    amd64        pocl ICD
ii  python3-idna                                   3.6-2ubuntu0.1                                   all          Python IDNA2008 (RFC 5891) handling (Python 3)
ii  rocm-opencl                                    2.0.0.70200-43~24.04                             amd64        clr built using CMake
ii  rocm-opencl-dev                                2.0.0.70200-43~24.04                             amd64        clr built using CMake
ii  rocm-opencl-runtime                            7.2.0.70200-43~24.04                             amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-opencl-sdk                                7.2.0.70200-43~24.04                             amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rpp                                            2.2.0.70200-43~24.04                             amd64        Computer vision library for AMD CPUs and GPUs with HOST/HIP/OpenCL back-ends
ii  rpp-dev                                        2.2.0.70200-43~24.04                             amd64        Computer vision library for AMD CPUs and GPUs with HOST/HIP/OpenCL back-ends

Driver Installation Details

Vanilla Ubuntu 24.04.4 LTS, with OpenCL runtime installation following https://github.com/ProjectPhysX/FluidX3D/blob/master/DOCUMENTATION.md#0-install-gpu-drivers-and-opencl-runtime

Linux Distribution

Ubuntu 24.04.4 LTS

Kernel Version & Boot Parameters

moritz@opencl-pc:~/Documents/FluidX3D$ uname -r
6.17.0-19-generic

Actual Behavior

Running the FluidX3D multi-GPU benchmark with 2 GPUs hangs at initialization, at first OpenCL kernel execution, when one of the GPUs is Intel Arc B580.

Any other combination of GPUs, i.e. Nvidia Titan Xp + Intel UHD 770, or AMD RX 7700 XT + Nvidia Titan Xp, works without issues.

I tracked the hang down to execution of 2 small OpenCL kernels: transfer_extract_rho_u_flags and transfer__insert_rho_u_flags. Here is the entire relevant code section:

#define uxx uint
uxx index(const uint3 xyz) { // assemble 1D index from 3D coordinates (x,y,z -> n)
        return (uxx)xyz.x+(uxx)(xyz.y+xyz.z*def_Ny)*(uxx)def_Nx; // n = x+(y+z*Ny)*Nx
}
uint get_area(const uint direction) {
        const uint A[3] = { def_Ax, def_Ay, def_Az };
        return A[direction];
}
uxx index_extract_p(const uint a, const uint direction) {
        const uint3 coordinates[3] = { (uint3)(def_Nx-2u, a%def_Ny, a/def_Ny), (uint3)(a/def_Nz, def_Ny-2u, a%def_Nz), (uint3)(a%def_Nx, a/def_Nx, def_Nz-2u) };
        return index(coordinates[direction]);
}
uxx index_extract_m(const uint a, const uint direction) {
        const uint3 coordinates[3] = { (uint3)(       1u, a%def_Ny, a/def_Ny), (uint3)(a/def_Nz,        1u, a%def_Nz), (uint3)(a%def_Nx, a/def_Nx,        1u) };
        return index(coordinates[direction]);
}
uxx index_insert_p(const uint a, const uint direction) {
        const uint3 coordinates[3] = { (uint3)(def_Nx-1u, a%def_Ny, a/def_Ny), (uint3)(a/def_Nz, def_Ny-1u, a%def_Nz), (uint3)(a%def_Nx, a/def_Nx, def_Nz-1u) };
        return index(coordinates[direction]);
}
uxx index_insert_m(const uint a, const uint direction) {
        const uint3 coordinates[3] = { (uint3)(       0u, a%def_Ny, a/def_Ny), (uint3)(a/def_Nz,        0u, a%def_Nz), (uint3)(a%def_Nx, a/def_Nx,        0u) };
        return index(coordinates[direction]);
}
void extract_rho_u_flags(const uint a, const uint A, const uxx n, global char* transfer_buffer, const global float* rho, const global float* u, const global uchar* flags) {
        ((global float*)transfer_buffer)[      a] = rho[               n];
        ((global float*)transfer_buffer)[    A+a] = u[                 n];
        ((global float*)transfer_buffer)[ 2u*A+a] = u[    def_N+(ulong)n];
        ((global float*)transfer_buffer)[ 3u*A+a] = u[2ul*def_N+(ulong)n];
        ((global uchar*)transfer_buffer)[16u*A+a] = flags[             n]; // <-- this causes the hang on Intel arc B580! 
}
void insert_rho_u_flags(const uint a, const uint A, const uxx n, const global char* transfer_buffer, global float* rho, global float* u, global uchar* flags) {
        rho[               n] = ((const global float*)transfer_buffer)[      a];
        u[                 n] = ((const global float*)transfer_buffer)[    A+a];
        u[    def_N+(ulong)n] = ((const global float*)transfer_buffer)[ 2u*A+a];
        u[2ul*def_N+(ulong)n] = ((const global float*)transfer_buffer)[ 3u*A+a];
        flags[             n] = ((const global uchar*)transfer_buffer)[16u*A+a]; // <-- this causes extreme slowdown on Intel arc B580! 
}
kernel void transfer_extract_rho_u_flags(const uint direction, const ulong t, global char* transfer_buffer_p, global char* transfer_buffer_m, const global float* rho, const global float* u, const global uchar* flags) {
        const uint a=get_global_id(0), A=get_area(direction); // a = domain area index for each side, A = area of the domain boundary
        if(a>=A) return; // area might not be a multiple of cl_workgroup_size, so return here to avoid writing in unallocated memory space
        extract_rho_u_flags(a, A, index_extract_p(a, direction), transfer_buffer_p, rho, u, flags);
        extract_rho_u_flags(a, A, index_extract_m(a, direction), transfer_buffer_m, rho, u, flags);
}
kernel void transfer__insert_rho_u_flags(const uint direction, const ulong t, const global char* transfer_buffer_p, const global char* transfer_buffer_m, global float* rho, global float* u, global uchar* flags) {
        const uint a=get_global_id(0), A=get_area(direction); // a = domain area index for each side, A = area of the domain boundary
        if(a>=A) return; // area might not be a multiple of cl_workgroup_size, so return here to avoid writing in unallocated memory space
        insert_rho_u_flags(a, A, index_insert_p(a, direction), transfer_buffer_p, rho, u, flags);
        insert_rho_u_flags(a, A, index_insert_m(a, direction), transfer_buffer_m, rho, u, flags);
}

The code lines responsible for the hang are:

        ((global uchar*)transfer_buffer)[16u*A+a] = flags[             n]; // <-- this causes the hang on Intel arc B580! 
        flags[             n] = ((const global uchar*)transfer_buffer)[16u*A+a]; // <-- this causes extreme slowdown on Intel arc B580! 

The issue is not the casting global char* transfer_buffer to (global uchar*)transfer_buffer or casting const global char* transfer_buffer to (const global uchar*)transfer_buffer, but the memory load/store in 8-bit integer itself. The store causes hang, the load causes extreme slowdown in all following kernel executions. This is regression in Intel GPU driver. Please fix!

Expected Behavior

Expected is that FluidX3D for any combination of 2 or more GPUs will successfully initialize and complete the benchmark without hang or slowdown.

Reproduction Rate

Always reproduces - 100%

Steps to Reproduce

You need a system with an Intel Arc B580 GPU, Ubuntu 24.04.4 LTS, and Intel OpenCL runtime installed according to here and here.

Clone FluidX3D;

git clone https://github.com/ProjectPhysX/FluidX3D.git && cd FluidX3D
chmod +x make.sh

Modify src/setup.cpp:

  • comment out here:
      		//LBM lbm(256u, 256u, 256u, 1.0f); // default
  • uncomment here:
    		const uint memory = 1488u; // memory occupation in MB (for multi-GPU benchmarks: make this close to as large as the GPU's VRAM capacity)
    		const uint3 lbm_N = (resolution(float3(1.0f, 1.0f, 1.0f), memory)/4u)*4u; // input: simulation box aspect ratio and VRAM occupation in MB, output: grid resolution
    		//LBM lbm(1u*lbm_N.x, 1u*lbm_N.y, 1u*lbm_N.z, 1u, 1u, 1u, 1.0f); // 1 GPU
    		LBM lbm(2u*lbm_N.x, 1u*lbm_N.y, 1u*lbm_N.z, 2u, 1u, 1u, 1.0f); // 2 GPUs

Compile and run on Intel Arc B580 + Intel Arc B580 (yes, you can select the same GPU ID twice). Check that the OpenCL device ID (here 0) for Intel Arc B580 is selected twice:

./make.sh 0 0

You will then see a hang here:

moritz@opencl-pc:~/Documents/FluidX3D$ ./make.sh 0 0
Info: Detected Operating System: Linux
Info: Compiling with 24 CPU cores.
g++ -c src/kernel.cpp -o temp/kernel.o -std=c++17 -pthread -O -Wno-comment
g++ temp/*.o -o bin/FluidX3D -std=c++17 -pthread -O -Wno-comment -I./src/OpenCL/include -L./src/OpenCL/lib -lOpenCL
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 3.6 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID    0 | Intel(R) Arc(TM) B580 Graphics                             |
| Device ID    1 | Intel(R) UHD Graphics 770                                  |
| Device ID    2 | NVIDIA TITAN Xp                                            |
| Device ID    3 | 13th Gen Intel(R) Core(TM) i7-13700K                       |
| Device ID    4 | AMD Radeon RX 7700 XT                                      |
| Device ID    5 | cpu-haswell-13th Gen Intel(R) Core(TM) i7-13700K           |
|----------------'------------------------------------------------------------|
| Info: Creating domains                                                      |
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Intel(R) Arc(TM) B580 Graphics                             |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 26.05.37020.3 (Linux)                                      |
| OpenCL Version | OpenCL C 3.0                                               |
| Compute Units  | 160 at 2850 MHz (2560 cores, 14.592 TFLOPs/s)              |
| Memory, Cache  | 12215 MB VRAM, 18432 KB global / 128 KB local              |
| Buffer Limits  | 11605 MB global, 11883724 KB constant                      |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Intel(R) Arc(TM) B580 Graphics                             |
| Device Vendor  | Intel(R) Corporation                                       |
| Device Driver  | 26.05.37020.3 (Linux)                                      |
| OpenCL Version | OpenCL C 3.0                                               |
| Compute Units  | 160 at 2850 MHz (2560 cores, 14.592 TFLOPs/s)              |
| Memory, Cache  | 12215 MB VRAM, 18432 KB global / 128 KB local              |
| Buffer Limits  | 11605 MB global, 11883724 KB constant                      |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| Info: Allocating memory. This may take a few seconds.                       |

Without hang, output would continue:

|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                528 x 264 x 264 = 36799488 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 596 MB, GPU 2x 972 MB |
| Max Alloc Size  |                                                    666 MB |
| Time Steps      |                                                       500 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 373 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    4360 |    336 GB/s |       118 |          497  99% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 4414                                                   |
moritz@opencl-pc:~/Documents/FluidX3D$

Is this a regression?

  • Yes, this is a regression - functionality that previously worked is now broken

Last Known Working Driver Version

25.18.33578.6

First Known Failing Driver Version

26.05.37020.3

Source Code / Reproducer

https://github.com/ProjectPhysX/FluidX3D

Metadata

Metadata

Assignees

No one assigned

    Labels

    OS: LinuxIssue specific to Linux distributions (Ubuntu, Fedora, RHEL, etc.)Type: BugGeneral bug report, unexpected behavior or crashType: RegressionPreviously working functionality is now broken

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions