Skip to content

Protect concurrent grant updates (fix for error pq: tuple concurrently updated) #520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

gesanderson
Copy link

@gesanderson gesanderson commented Mar 13, 2025

Problem description

Postgres does not support concurrent modifications to the pg_class catalog. This catalog manages grants on objects such as databases, schemas or tables. Attempting to update grants concurrently returns an error such as:

Error: could not execute revoke query: pq: tuple concurrently updated

This error does not break the database, but it does prevent terraform from applying changes.

Further details can be found in this pg thread: https://www.postgresql.org/message-id/3473.1393693757%40sss.pgh.pa.us

Context

I have a database with 400 partitions and a dozen roles each needing different kinds of grants. I would like to centralize and ease the management of these roles with a terraform stack. Importing the roles or adding new ones will trigger the bug since the grants are applied concurrently to all tables & partitions.

Prior work

A first fix was proposed in #178 but the original author never followed up

A second fix is open in #510 however it only protects concurrent grants on schemas. The same issue can happen on any kind of concurrent grant update.

I have reused the test setup first created by @kylejohnson and improved it to fully reproduce my issue locally.

Proposal

Add a new provider setting named lock_grants. When enabled, all changes to grants will acquire an advisory lock by reusing the pgLockDatabase function. This ensures only a single grant is executed at any time.

However there is a tradeoff: Execution time can potentially be slowed down significantly if there's a large quantity of postgresql_grant, postgresql_schema or postgresql_database resources that need to be applied. This is why the feature is optional and disabled by default.

Alternative implementations

I had 2 other ideas on how to solve this issue that are worth considering but would take more time to implement:

Retry an operation on error

Some providers such as the one for AWS have optional retry {} blocks that will retry an operation if it fails. A similar concept could be added to postgresql_grant resources.

This was my first idea, but I excluded it because it remains error prone. A user must go through multiple rounds of trials and errors to find the appropriate number of retries that fits their needs.

Individually lock each object

The provider could create individual advisory locks for each object being granted. There are 2 downsides to this idea:

  • Complexity. A lot more code is needed to handle each kind of object.
  • Potential for conflicting lock IDs if an object has the same OID as another advisory lock being used by an application
  • GRANT ... ON ALL OBJECTS needs to fetch a list of all objects that must be locked and always lock them in the same order to avoid deadlocks.

Work arounds

There are 2 workarounds to this issue:

Limiting parallelism during an apply

Using terraform apply -parallelism=1 prevents the issue. However it also slows down applying all changes made by terraform

Limiting the number of connections

Configuring the provider with max_connections = 1 will decrease the chances of getting the error, but it will not completely eliminate it. In my tests I would still get the error on occasion. I have not investigated why, but I suspect an implementation flaw in the golang postgresql driver.

@jarpoole
Copy link

Awesome work! It would be really cool if we could get a beta release similar to last time (1.21.1-beta.1) to validate this change in the real world!

@gesanderson
Copy link
Author

gesanderson commented Mar 14, 2025

It would be really cool if we could get a beta release similar to last time (1.21.1-beta.1)

I believe that can only be done by people with write access to the repo (which is not my case). It seems @kylejohnson released the beta version last time, so perhaps we could ask either him or @cyrilgdn ?

@xadereq
Copy link

xadereq commented Apr 9, 2025

Can we get it merged? @kylejohnson / @cyrilgdn

@AutomationD
Copy link

@cyrilgdn is there anything someone could do to get this PR moving please?

@mattthaber
Copy link

This is much needed

@AutomationD
Copy link

It would be really cool if we could get a beta release similar to last time (1.21.1-beta.1)

I believe that can only be done by people with write access to the repo (which is not my case). It seems @kylejohnson released the beta version last time, so perhaps we could ask either him or @cyrilgdn ?

@gesanderson Would you be open to temporary publishing a forked provider to Terraform Registry, until this one gets going?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants