-
Notifications
You must be signed in to change notification settings - Fork 369
Description
Which component impacted?
Decode
Is it regression? Good in old configuration?
No, this issue exist a long time
What happened?
We use three parallel GStreamer pipelines that receive IP camera traffic via UDP packages with H264 payload and decode them for display on the screen. After the streams run for a couple of minutes, the GPU hang and usually (but not always) result in Kernel reboot.
What's the usage scenario when you are seeing the problem?
Transcode for media delivery
What impacted?
No response
Debug Information
What's libva/libva-utils/gmmlib/media-driver version?
- intel-media-driver: 22.3.1
- libva: 2.14.0
- libva-utils: 2.14.0
- gmmlib: 22.1.2
Could you confirm whether GPU hardware exist or not by ls /dev/dri?
by-path card0 renderD128
Could you provide the GPU hardware infromation by lspci -nn |grep -Ei 'VGA|DISPLAY'?
00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:4555] (rev 01)
Could you provide vainfo log by vainfo >vainfo.log 2>&1?
- see attached
Could you provide libva trace log? Run cmd export LIBVA_TRACE=/tmp/libva_trace.log first then execute the case.
- see attached output of /sys/class/drm/card0/error
- will also do libva_trace as follow-up
Could you attach dmesg log if GPU hang by dmesg >dmesg.log 2>&1?
08:36:49.493 UTC i915 0000:00:02.0: [drm] Resetting vcs0 for preemption time out
08:36:49.502 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:36:49.518 UTC i915 0000:00:02.0: [drm] GPU HANG: ecode 11:4:2cfffffd, in FRONT_COUPLING- [51352]
08:37:04.475 UTC i915 0000:00:02.0: [drm] GPU HANG: ecode 11:4:2cfffffd, in FRONT_COUPLING- [51352]
08:37:04.484 UTC i915 0000:00:02.0: [drm] Resetting vcs0 for stopped heartbeat on vcs0
08:37:04.485 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.506 UTC i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on vcs0
08:37:04.620 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.621 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.645 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.671 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.671 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.696 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.731 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.732 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.755 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.811 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.812 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.831 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.832 UTC i915 0000:00:02.0: [drm] ERROR Failed to reset chip
08:37:04.832 UTC i915 0000:00:02.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by intel_gt_reset+0x2d9/0x300 [i915]
08:37:04.953 UTC [drm:__uc_sanitize [i915]] ERROR Failed to reset GuC, ret = -110
08:37:04.965 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.977 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.990 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:05.000 UTC i915 0000:00:02.0: [drm] FRONT_COUPLING-[51352] context reset due to GPU hang
Do you want to contribute a patch to fix the issue?
None