Skip to content

Deadlock (or TOO LONG query) #211

@tarkhil

Description

@tarkhil

On ONE of my servers, I'm getting an error

cld-bi-dwh02d: Error retrieving datasource data pg_wait_sampling: canceling statement due to lock timeout CONTEXT: SQL statement " 
-- the various background processes report wait events but don't have 
-- associated queryid. Gather them all under a fake 0 dbid 
SELECT now(), COALESCE(pgss.dbid, 0) AS dbid, s.event_type, s.event, s.queryid, sum(s.count) as count FROM public.pg_wait_sampling_profile s 
-- pg_wait_sampling doesn't offer a per (userid, dbid, queryid) view, 
-- only per pid, but pid can be reused for different databases or users 
-- so we cannot deduce db or user from it. However, queryid should be 
-- unique across differet databases, so we retrieve the dbid this way. 
-- Note that the same queryid can exists for multiple entries if 
-- multiple users execute the query, so it's critical to retrieve a 
-- single row from pg_stat_statements per (dbid, queryid) 
LEFT JOIN (SELECT DISTINCT s2.dbid, s2.queryid FROM public.pg_stat_statements(false) s2 ) pgss ON pgss.queryid = s.queryid WHERE s.event_type IS NOT NULL AND s.event IS NOT NULL AND COALESCE(pgss.dbid, 0) NOT IN ( SELECT oid FROM public.powa_databases WHERE dropped IS NOT NULL ) GROUP BY pgss.dbid, s.event_type, s.event, s.queryid 
" PL/pgSQL function public.powa_wait_sampling_src(integer) line 17 at RETURN QUERY

Postgres 14.20 there (can't upgrade right now).

Exploring postgres got me

postgres=# SELECT pid, age(clock_timestamp(), query_start), usename, query, state
postgres-# FROM pg_stat_activity
postgres-# WHERE state not like 'idle%' AND query NOT ILIKE '%pg_stat_activity%'
postgres-# ORDER BY query_start desc;
   pid   |          age          | usename  |                               query                                | state
---------+-----------------------+----------+--------------------------------------------------------------------+--------
 1055561 | 1 day 15:39:54.16436  | postgres | SET search_path TO pg_catalog;SELECT public.powa_take_snapshot()   | active
 1055537 | 1 day 15:40:01.829876 | powa     | COPY (SELECT 2, * FROM public.powa_wait_sampling_src(0)) TO stdout | active
(2 rows)

What could cause the problem?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions