Skip to content

[Bug]: GPU_HANG on H264 decoding in vc0 #1969

@cordlandwehr

Description

@cordlandwehr

Which component impacted?

Decode

Is it regression? Good in old configuration?

No, this issue exist a long time

What happened?

We use three parallel GStreamer pipelines that receive IP camera traffic via UDP packages with H264 payload and decode them for display on the screen. After the streams run for a couple of minutes, the GPU hang and usually (but not always) result in Kernel reboot.

What's the usage scenario when you are seeing the problem?

Transcode for media delivery

What impacted?

No response

Debug Information

What's libva/libva-utils/gmmlib/media-driver version?

  • intel-media-driver: 22.3.1
  • libva: 2.14.0
  • libva-utils: 2.14.0
  • gmmlib: 22.1.2

Could you confirm whether GPU hardware exist or not by ls /dev/dri?
by-path card0 renderD128

Could you provide the GPU hardware infromation by lspci -nn |grep -Ei 'VGA|DISPLAY'?
00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:4555] (rev 01)

Could you provide vainfo log by vainfo >vainfo.log 2>&1?

  • see attached

Could you provide libva trace log? Run cmd export LIBVA_TRACE=/tmp/libva_trace.log first then execute the case.

  • see attached output of /sys/class/drm/card0/error
  • will also do libva_trace as follow-up

Could you attach dmesg log if GPU hang by dmesg >dmesg.log 2>&1?
08:36:49.493 UTC i915 0000:00:02.0: [drm] Resetting vcs0 for preemption time out
08:36:49.502 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:36:49.518 UTC i915 0000:00:02.0: [drm] GPU HANG: ecode 11:4:2cfffffd, in FRONT_COUPLING- [51352]
08:37:04.475 UTC i915 0000:00:02.0: [drm] GPU HANG: ecode 11:4:2cfffffd, in FRONT_COUPLING- [51352]
08:37:04.484 UTC i915 0000:00:02.0: [drm] Resetting vcs0 for stopped heartbeat on vcs0
08:37:04.485 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.506 UTC i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on vcs0
08:37:04.620 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.621 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.645 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.671 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.671 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.696 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.731 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.732 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.755 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.811 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.812 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.831 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.832 UTC i915 0000:00:02.0: [drm] ERROR Failed to reset chip
08:37:04.832 UTC i915 0000:00:02.0: [drm:add_taint_for_CI [i915]] CI tainted:0x9 by intel_gt_reset+0x2d9/0x300 [i915]
08:37:04.953 UTC [drm:__uc_sanitize [i915]] ERROR Failed to reset GuC, ret = -110
08:37:04.965 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.977 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:04.990 UTC i915 0000:00:02.0: [drm] ERROR vcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
08:37:05.000 UTC i915 0000:00:02.0: [drm] FRONT_COUPLING-[51352] context reset due to GPU hang

Do you want to contribute a patch to fix the issue?

None

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions