Skip to content

Race condition if two worker instances attempt to initialize clean database in parallel #496

Open
@benjie

Description

Only affects first run of worker against your DB, so if you already have worker running you're fine.

If you run 10 graphile-worker instances against a clean database all at the same time there's a good chance that in this code:

worker/src/migrate.ts

Lines 117 to 137 in 436e299

for (let attempts = 0; attempts < 2; attempts++) {
try {
const {
rows: [row],
} = await event.client.query(
`select current_setting('server_version_num') as server_version_num,
(select id from ${escapedWorkerSchema}.migrations order by id desc limit 1) as id,
(select id from ${escapedWorkerSchema}.migrations where breaking is true order by id desc limit 1) as biggest_breaking_id;`,
);
latestMigration = row.id;
latestBreakingMigration = row.biggest_breaking_id;
event.postgresVersion = checkPostgresVersion(row.server_version_num);
} catch (e) {
if (attempts === 0 && (e.code === "42P01" || e.code === "42703")) {
await installSchema(compiledSharedOptions, event);
} else {
throw e;
}
}
}

two or more will generate the requisite 42P01/42703 error, and both will attempt to install the schema, and only one can succeed. The other will throw an error and exit.

There's two bugs here:

  1. the installSchema should be wrapped with try/catch so that errors thrown there will trigger a retry of the whole loop
  2. there's no sleep(), so it'll instantly retry which can be painful - we should use randomized back-off

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions