Optimize DB Schema & Query for Top-Earning Leaderboard #340

Hany-Almnaem · 2025-03-09T16:51:25Z

Refactor DB schema: Add participant_id & foreign keys for performance

Changes Made:

Added address_mapping table and migrations.
Updated daily_scheduled_rewards and daily_reward_transfers to reference participant_id.
New approach with stable ID-based approach.

Performance:

Migration Steps:
new migration files/
├── ...
├── 008.do.create-address-mapping-table.sql
├── 009.do.backfill-address-mapping.sql
├── 010.do.add-participant_id-columns.sql
├── 011.do.populate-participant-ids.sql
├── 012.do.add-foreign-keys-and-indexes.sql

Please review and let me know if any further changes are needed.

Close CheckerNetwork/roadmap#178

pyropy

Great work, @Hany-Almnaem ! 🚀 Thanks for your submission! 🙏🏻

I'm wondering if we should reduce the number of migrations by combining some of them into a single file. For example, we could merge new table creation with backfilling or group the changes to daily_scheduled_rewards and daily_reward_transfers tables along with their backfilling. What do you think?

Let's also wait for feedback from others, but overall, this is looking great! 👍🏻

.vscode/easycode.ignore

Hany-Almnaem · 2025-03-10T13:09:48Z

@pyropy Thanks for your feedback!

I'm wondering if we should reduce the number of migrations by combining some of them into a single file.

It's a great Idea to reduce the number of migrations. but I recommend keeping each logical change in its own migration – it’s usually easier to debug and revert that way. However, combining steps is possible if it won’t cause issues later. In general, small, incremental migrations are the safest bet.

bajtos · 2025-03-12T08:35:49Z

@Hany-Almnaem thank you for the pull request. To clarify: is this superseding your earlier PR #324? Can we close #324 now?

There are several failed CI checks, please take a look and fix them. (You should be able to reproduce them locally by running npm run test.)

bajtos

The high-level direction looks good to me 👍🏻

Let's discuss the implementation details now.

db/migrations/008.do.create-address-mapping.sql

db/migrations/009.do.backfill-address-mapping.sql

db/migrations/012.do.add-foreign-keys-and-indexes.sql

stats/lib/platform-stats-fetchers.js

bajtos · 2025-03-12T09:00:37Z

@Hany-Almnaem I have a question about the performance.

In the previous PR, the current query takes 26ms to plan and 988ms to execute.
In the previous PR, the new query takes 2ms to plan and 580ms to execute.
In this PR, the new query takes 44ms to plan and 1327ms to execute.

That looks like a step in the wrong direction to me. We want to improve the performance of this query, not make it worse.

Is this related to the query modification discussed here or to preserving address columns discussed here?
Are we missing any indices?

Hany-Almnaem · 2025-03-12T14:53:42Z

@bajtos You alright,
the increased execution time is likely due to the query modifications, and the main factor could be inefficient index usage.
I'll review the indexing strategy. Let me investigate further, and I'll update the PR with improvements.

Hany-Almnaem · 2025-03-19T15:30:35Z

Supersedes Earlier PR #324

Summary of Changes

Unified Table Definitions

Aligned the schema with spark-evaluate to keep everything consistent and reduce confusion.

Combined Migrations and Clear Documentation

Merged multiple smaller migrations into one, adding detailed comments to explain each step and keep the migration history concise.

Removed Old Columns

Dropped legacy columns that are no longer needed after the new schema changes.

Added Composite Indexes

Added Indexes for improved query performance when filtering by day and address.

Updated Test Cases

Adjusted existing tests to match the new schema, ensuring they accurately reflect the latest logic and structures.

Optimized Query Logic in API Fetchers

Switched to using participant IDs instead of participant addresses in joins, and leveraged the new indexes for faster lookups.

With these changes, we achieve a more efficient database schema, better performance on large datasets, and clearer migration steps.

Feel free to review and let me know if anything needs further adjustment

bajtos

Great progress!

db/migrations/011.do.update-indexes.sql

observer/lib/observer.js

observer/test/observer.test.js

observer/test/platform-stats.test.js

db/migrations/011.do.update-indexes.sql

db/migrations/008.do.create-participants-table.sql

bajtos

I cleaned up the first part of the patch, see the commits above. I need to take a closer look at the second part.

observer/test/observer.test.js

bajtos · 2025-04-04T08:27:42Z

observer/test/observer.test.js

-          { args: { to: 'address1', amount: 250 }, blockNumber: 2000 }
+          { args: { to: 'address1', amount: 150 }, blockNumber: 2000 }


Ditto, let's revert.

observer/test/observer.test.js

observer/test/platform-stats.test.js

…eign keys

run the indexing process in background Co-authored-by: Miroslav Bajtoš <[email protected]>

Signed-off-by: Miroslav Bajtoš <[email protected]>

…form-stats test

bajtos · 2025-04-11T05:56:30Z

@Hany-Almnaem please don't force-push to pull requests, it makes it more difficult to incrementally review only what's changed since the last review. We use merge commits to bring new changes from the main branch: git merge main or [Update branch] button in GitHub UI.

Signed-off-by: Miroslav Bajtoš <[email protected]>

bajtos · 2025-04-11T06:13:52Z

stats/test/app.test.js

+    await pgPools.stats.query(`
+      INSERT INTO participants (id, participant_address)
+      VALUES (1, '0x20'), (2, '0x00')
+    `)
+  })


We need to use mapParticipantsToIds instead of hard-coded participant ids.

I recommend creating a helper that will ensure the address is mapped to an ID and then call the sql query to insert into daily_scheduled_rewards.

bajtos · 2025-04-11T06:14:11Z

stats/test/platform-routes.test.js

+    await pgPools.stats.query(`
+      INSERT INTO participants (id, participant_address)
+      VALUES
+        (1, 'to1'),
+        (2, 'to2'),
+        (3, 'to3'),
+        (4, 'address1'),
+        (5, 'address2'),
+        (6, 'address3')
+    `)


Same here - use mapParticipantsToIds

Do we actually need to prepare this mapping in advance? I would expect givenDailyRewardTransferMetrics to take care for that.

bajtos · 2025-04-11T07:04:52Z

stats/test/platform-routes.test.js

+    await pgPools.stats.query(`
+      INSERT INTO participants (id, participant_address)
+      VALUES
+        (1, 'to1'),
+        (2, 'to2'),
+        (3, 'to3'),
+        (4, 'address1'),
+        (5, 'address2'),
+        (6, 'address3')
+    `)


Ditto.

I recommend creating a helper that will update both participants and daily_scheduled_rewards tables, e.g. givenDailyScheduledRewards(day, participantAddress, scheduledRewards).

bajtos · 2025-04-11T07:06:03Z

stats/test/platform-routes.test.js

+    const participantResult = await pgPoolStats.query(
+      'SELECT id FROM participants WHERE participant_address = $1',
+      [transfer.toAddress]
+    )
+
+    if (participantResult.rows.length === 0) {
+      throw new Error(`Participant address ${transfer.toAddress} not found`)
+    }


Let's use mapParticipantsToIds to map new addresses to participant ids, so that we don't have to do it manually in tests.

bajtos

Thank you for the updates, @Hany-Almnaem. I appreciate your perseverance!

I did a bit of cleanup in 2475264 to speed things up.

I identified three more areas in tests that need improving to use mapParticipantsToIds, please take a look at my comments above.

…ers and mapParticipantsToIds

…articipantsToIds

Hany-Almnaem · 2025-04-13T16:54:05Z

@bajtos Thanks for your earlier feedback and the cleanup. Much appreciated!

I noticed the opportunity to unify the helper functions, but I decided to keep them separate for now to preserve clarity between different test contexts. Just wanted to call that out in case it comes up. happy to refactor further if needed.

bajtos

Can you please fix the linting errors?

bajtos · 2025-04-23T13:37:13Z

db/test-helpers.js

+    [day, id, amount, lastCheckedBlock],
+  );
+};
+export { mapParticipantsToIds } from "../observer/lib/map-participants-to-ids.js";


I find it confusing to have two functions called mapParticipantsToIds. Can we find a way to discriminate between the function to map participants in spark_evaluate database and the function to map participants in spark_stats database?

Hey @bajtos, thanks for flagging that.
In the last commit, I’ve now stopped importing mapParticipantsToIds directly in tests and wrapped each use-case in its own helper:

givenDailyParticipants still uses the evaluate version under the hood

givenRewardTransfer & givenScheduledRewards wrap the stats version

Tests only pull in those descriptive helpers, and only the stats-specific mapper is re-exported from spark-stats-db/test-helpers.js. That way, there’s no longer any ambiguity about which database we’re mapping against. Let me know if you’d prefer alternate names or locations.

bajtos · 2025-04-23T13:38:05Z

stats/test/app.test.js

  /** @type {import('@filecoin-station/spark-stats-db').PgPools} */
-  let pgPools
+  let pgPools;


This file contains a lot of unrelated changes, can you please revert them?

I think npm run lint:fix may be all you need.

… helpers with unified test-helpers

bajtos · 2025-07-07T16:31:04Z

We have shut down the permissionless Checker Network, and this pull request is no longer relevant.

Thank you @Hany-Almnaem for all the effort you put into it! ❤️

Hany-Almnaem requested review from bajtos, juliangruber, pyropy and NikolasHaimerl as code owners March 9, 2025 16:51

pyropy requested changes Mar 10, 2025

View reviewed changes

.vscode/easycode.ignore Outdated Show resolved Hide resolved

bajtos requested changes Mar 12, 2025

View reviewed changes

bajtos requested review from bajtos and pyropy March 20, 2025 08:01

bajtos reviewed Mar 20, 2025

View reviewed changes

bajtos reviewed Apr 4, 2025

View reviewed changes

db/migrations/011.do.update-indexes.sql Outdated Show resolved Hide resolved

bajtos reviewed Apr 4, 2025

View reviewed changes

db/migrations/008.do.create-participants-table.sql Outdated Show resolved Hide resolved

bajtos requested changes Apr 4, 2025

View reviewed changes

Hany-Almnaem and others added 11 commits April 7, 2025 14:44

Optimize DB schema for Leaderboard: add stable participant_id and for…

e050c7f

…eign keys

Remove unnecessary file

4eed9af

Optimize query to use participant_id joins, add new indexing strategy

3ec9f9e

auto-create participants for new addresses + test

39ec5f2

Apply suggestions from code review

1755f84

run the indexing process in background Co-authored-by: Miroslav Bajtoš <[email protected]>

fixup! simplify DB schema + fix migrations

08cef7a

Signed-off-by: Miroslav Bajtoš <[email protected]>

refactor: extract mapParticipantsToIds()

0853be8

Signed-off-by: Miroslav Bajtoš <[email protected]>

fix bugs and clean up tests

6a183a7

Signed-off-by: Miroslav Bajtoš <[email protected]>

fixup! more cleanup

6c33c8d

Signed-off-by: Miroslav Bajtoš <[email protected]>

Revert changes to observer transfer tests

b1d3be5

Use mapParticipantsToIds instead of hardcoded participant IDs in plat…

4081ba7

…form-stats test

Hany-Almnaem force-pushed the Optimise-Leaderboard-FIL-Earned branch from 490af82 to 4081ba7 Compare April 7, 2025 11:47

bajtos added 2 commits April 11, 2025 08:09

final cleanup

2475264

Signed-off-by: Miroslav Bajtoš <[email protected]>

Merge branch 'main' into Optimise-Leaderboard-FIL-Earned

b1aff73

bajtos reviewed Apr 11, 2025

View reviewed changes

bajtos requested changes Apr 11, 2025

View reviewed changes

Hany-Almnaem added 2 commits April 12, 2025 18:47

test: refactor scheduled rewards & transfers tests to use shared help…

8dd35bf

…ers and mapParticipantsToIds

test(platform): refactor reward transfer & schedule tests to use mapP…

c8c0a91

…articipantsToIds

bajtos requested changes Apr 23, 2025

View reviewed changes

Hany-Almnaem and others added 2 commits May 8, 2025 16:33

refactor(stats): replace manual reward‐transfer and scheduled‐rewards…

f67bd2e

… helpers with unified test-helpers

Merge branch 'main' into Optimise-Leaderboard-FIL-Earned

8b8ddd9

bajtos closed this Jul 7, 2025

		{ args: { to: 'address1', amount: 250 }, blockNumber: 2000 }
		{ args: { to: 'address1', amount: 150 }, blockNumber: 2000 }

Optimize DB Schema & Query for Top-Earning Leaderboard #340

Optimize DB Schema & Query for Top-Earning Leaderboard #340

Uh oh!

Conversation

Hany-Almnaem commented Mar 9, 2025

Uh oh!

pyropy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Hany-Almnaem commented Mar 10, 2025

Uh oh!

bajtos commented Mar 12, 2025

Uh oh!

bajtos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bajtos commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hany-Almnaem commented Mar 12, 2025

Uh oh!

Hany-Almnaem commented Mar 19, 2025

Uh oh!

bajtos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bajtos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bajtos commented Apr 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bajtos left a comment

Choose a reason for hiding this comment

Uh oh!

Hany-Almnaem commented Apr 13, 2025

Uh oh!

bajtos left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

bajtos commented Mar 12, 2025 •

edited

Loading