-
Notifications
You must be signed in to change notification settings - Fork 4
Optimize DB Schema & Query for Top-Earning Leaderboard #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Optimize DB Schema & Query for Top-Earning Leaderboard #340
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, @Hany-Almnaem ! 🚀 Thanks for your submission! 🙏🏻
I'm wondering if we should reduce the number of migrations by combining some of them into a single file. For example, we could merge new table creation with backfilling or group the changes to daily_scheduled_rewards and daily_reward_transfers tables along with their backfilling. What do you think?
Let's also wait for feedback from others, but overall, this is looking great! 👍🏻
@pyropy Thanks for your feedback!
It's a great Idea to reduce the number of migrations. but I recommend keeping each logical change in its own migration – it’s usually easier to debug and revert that way. However, combining steps is possible if it won’t cause issues later. In general, small, incremental migrations are the safest bet. |
@Hany-Almnaem thank you for the pull request. To clarify: is this superseding your earlier PR #324? Can we close #324 now? There are several failed CI checks, please take a look and fix them. (You should be able to reproduce them locally by running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The high-level direction looks good to me 👍🏻
Let's discuss the implementation details now.
@Hany-Almnaem I have a question about the performance.
That looks like a step in the wrong direction to me. We want to improve the performance of this query, not make it worse.
|
@bajtos You alright, |
Supersedes Earlier PR #324 Summary of Changes
Aligned the schema with spark-evaluate to keep everything consistent and reduce confusion.
Merged multiple smaller migrations into one, adding detailed comments to explain each step and keep the migration history concise.
Dropped legacy columns that are no longer needed after the new schema changes.
Added Indexes for improved query performance when filtering by day and address.
Adjusted existing tests to match the new schema, ensuring they accurately reflect the latest logic and structures.
Switched to using participant IDs instead of participant addresses in joins, and leveraged the new indexes for faster lookups. With these changes, we achieve a more efficient database schema, better performance on large datasets, and clearer migration steps. Feel free to review and let me know if anything needs further adjustment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned up the first part of the patch, see the commits above. I need to take a closer look at the second part.
observer/test/observer.test.js
Outdated
{ args: { to: 'address1', amount: 250 }, blockNumber: 2000 } | ||
{ args: { to: 'address1', amount: 150 }, blockNumber: 2000 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, let's revert.
run the indexing process in background Co-authored-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
490af82
to
4081ba7
Compare
![]() @Hany-Almnaem please don't force-push to pull requests, it makes it more difficult to incrementally review only what's changed since the last review. We use merge commits to bring new changes from the |
Signed-off-by: Miroslav Bajtoš <[email protected]>
stats/test/app.test.js
Outdated
await pgPools.stats.query(` | ||
INSERT INTO participants (id, participant_address) | ||
VALUES (1, '0x20'), (2, '0x00') | ||
`) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to use mapParticipantsToIds
instead of hard-coded participant ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend creating a helper that will ensure the address is mapped to an ID and then call the sql query to insert into daily_scheduled_rewards
.
stats/test/platform-routes.test.js
Outdated
await pgPools.stats.query(` | ||
INSERT INTO participants (id, participant_address) | ||
VALUES | ||
(1, 'to1'), | ||
(2, 'to2'), | ||
(3, 'to3'), | ||
(4, 'address1'), | ||
(5, 'address2'), | ||
(6, 'address3') | ||
`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - use mapParticipantsToIds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to prepare this mapping in advance? I would expect givenDailyRewardTransferMetrics
to take care for that.
stats/test/platform-routes.test.js
Outdated
await pgPools.stats.query(` | ||
INSERT INTO participants (id, participant_address) | ||
VALUES | ||
(1, 'to1'), | ||
(2, 'to2'), | ||
(3, 'to3'), | ||
(4, 'address1'), | ||
(5, 'address2'), | ||
(6, 'address3') | ||
`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
I recommend creating a helper that will update both participants
and daily_scheduled_rewards
tables, e.g. givenDailyScheduledRewards(day, participantAddress, scheduledRewards)
.
stats/test/platform-routes.test.js
Outdated
const participantResult = await pgPoolStats.query( | ||
'SELECT id FROM participants WHERE participant_address = $1', | ||
[transfer.toAddress] | ||
) | ||
|
||
if (participantResult.rows.length === 0) { | ||
throw new Error(`Participant address ${transfer.toAddress} not found`) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use mapParticipantsToIds
to map new addresses to participant ids, so that we don't have to do it manually in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the updates, @Hany-Almnaem. I appreciate your perseverance!
I did a bit of cleanup in 2475264 to speed things up.
I identified three more areas in tests that need improving to use mapParticipantsToIds
, please take a look at my comments above.
…ers and mapParticipantsToIds
@bajtos Thanks for your earlier feedback and the cleanup. Much appreciated! I noticed the opportunity to unify the helper functions, but I decided to keep them separate for now to preserve clarity between different test contexts. Just wanted to call that out in case it comes up. happy to refactor further if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
db/test-helpers.js
Outdated
[day, id, amount, lastCheckedBlock], | ||
); | ||
}; | ||
export { mapParticipantsToIds } from "../observer/lib/map-participants-to-ids.js"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it confusing to have two functions called mapParticipantsToIds
. Can we find a way to discriminate between the function to map participants in spark_evaluate
database and the function to map participants in spark_stats
database?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @bajtos, thanks for flagging that.
In the last commit, I’ve now stopped importing mapParticipantsToIds directly in tests and wrapped each use-case in its own helper:
givenDailyParticipants still uses the evaluate version under the hood
givenRewardTransfer & givenScheduledRewards wrap the stats version
Tests only pull in those descriptive helpers, and only the stats-specific mapper is re-exported from spark-stats-db/test-helpers.js. That way, there’s no longer any ambiguity about which database we’re mapping against. Let me know if you’d prefer alternate names or locations.
stats/test/app.test.js
Outdated
/** @type {import('@filecoin-station/spark-stats-db').PgPools} */ | ||
let pgPools | ||
let pgPools; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file contains a lot of unrelated changes, can you please revert them?
I think npm run lint:fix
may be all you need.
… helpers with unified test-helpers
Refactor DB schema: Add participant_id & foreign keys for performance
Changes Made:
Performance:
Migration Steps:
new migration files/
├── ...
├── 008.do.create-address-mapping-table.sql
├── 009.do.backfill-address-mapping.sql
├── 010.do.add-participant_id-columns.sql
├── 011.do.populate-participant-ids.sql
├── 012.do.add-foreign-keys-and-indexes.sql
Please review and let me know if any further changes are needed.
Close CheckerNetwork/roadmap#178