-
Notifications
You must be signed in to change notification settings - Fork 234
Protect concurrent grant updates (fix for error pq: tuple concurrently updated) #520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Awesome work! It would be really cool if we could get a beta release similar to last time (1.21.1-beta.1) to validate this change in the real world! |
I believe that can only be done by people with write access to the repo (which is not my case). It seems @kylejohnson released the beta version last time, so perhaps we could ask either him or @cyrilgdn ? |
Can we get it merged? @kylejohnson / @cyrilgdn |
@cyrilgdn is there anything someone could do to get this PR moving please? |
This is much needed |
@gesanderson Would you be open to temporary publishing a forked provider to Terraform Registry, until this one gets going? |
Problem description
Postgres does not support concurrent modifications to the
pg_class
catalog. This catalog manages grants on objects such as databases, schemas or tables. Attempting to update grants concurrently returns an error such as:This error does not break the database, but it does prevent terraform from applying changes.
Further details can be found in this pg thread: https://www.postgresql.org/message-id/3473.1393693757%40sss.pgh.pa.us
Context
I have a database with 400 partitions and a dozen roles each needing different kinds of grants. I would like to centralize and ease the management of these roles with a terraform stack. Importing the roles or adding new ones will trigger the bug since the grants are applied concurrently to all tables & partitions.
Prior work
A first fix was proposed in #178 but the original author never followed up
A second fix is open in #510 however it only protects concurrent grants on schemas. The same issue can happen on any kind of concurrent grant update.
I have reused the test setup first created by @kylejohnson and improved it to fully reproduce my issue locally.
Proposal
Add a new provider setting named
lock_grants
. When enabled, all changes to grants will acquire an advisory lock by reusing thepgLockDatabase
function. This ensures only a single grant is executed at any time.However there is a tradeoff: Execution time can potentially be slowed down significantly if there's a large quantity of
postgresql_grant
,postgresql_schema
orpostgresql_database
resources that need to be applied. This is why the feature is optional and disabled by default.Alternative implementations
I had 2 other ideas on how to solve this issue that are worth considering but would take more time to implement:
Retry an operation on error
Some providers such as the one for AWS have optional
retry {}
blocks that will retry an operation if it fails. A similar concept could be added topostgresql_grant
resources.This was my first idea, but I excluded it because it remains error prone. A user must go through multiple rounds of trials and errors to find the appropriate number of retries that fits their needs.
Individually lock each object
The provider could create individual advisory locks for each object being granted. There are 2 downsides to this idea:
GRANT ... ON ALL OBJECTS
needs to fetch a list of all objects that must be locked and always lock them in the same order to avoid deadlocks.Work arounds
There are 2 workarounds to this issue:
Limiting parallelism during an apply
Using
terraform apply -parallelism=1
prevents the issue. However it also slows down applying all changes made by terraformLimiting the number of connections
Configuring the provider with
max_connections = 1
will decrease the chances of getting the error, but it will not completely eliminate it. In my tests I would still get the error on occasion. I have not investigated why, but I suspect an implementation flaw in the golang postgresql driver.