Skip to content

[WIP] using db-pool library to create a pool of databases#5846

Draft
momentary-lapse wants to merge 87 commits intoLemmyNet:mainfrom
momentary-lapse:parallel-db-tests
Draft

[WIP] using db-pool library to create a pool of databases#5846
momentary-lapse wants to merge 87 commits intoLemmyNet:mainfrom
momentary-lapse:parallel-db-tests

Conversation

@momentary-lapse
Copy link
Copy Markdown
Contributor

Addresses: #4979

Comment thread crates/db_schema/src/utils.rs Outdated
.await;

// TODO make compatible with ActualDbPool
db_pool.pull_immutable().await
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created this WIP PR to share the progress and the issue I'm stuck with currently. The crate I use operates with its own structure wrapping connection pools: code
And we have our own ActualDbPool. They are kinda same, but it's not obvious for me how to correctly convert one to another.
I had an idea to make ActualDbPool a enum with two possible values: RegularPool and ReusablePool, but stuck on trying to adapt stuff like LemmyContext, which also requires pool struct to be clone-able (and ReusablePool is not). And it seems a lot of changes to the main codebase for purely test changes.
Do you folks have any ideas how to manage that? Or should I stick to the initial plan without using this library?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our ActualDbPool is just a type alias for deadpool Pool<AsyncPgConnection>.

Their crate should be able to work with deadpool pools, but I'm not familiar with how to plug that into their crate... you'll have to ask them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see. I returned to this issue today after a week of a break. I'm in contact with the db-pool author and they're helping to understand a lot of moments and really willing to collaborate, so i think we'll make this work.

I'd like to clarify one moment: do we want build_db_pool_for_tests to return still ActualDbPool? db-pool has its own wrapper ReusableConnectionPool which works like a deadpool Pool, but a bit different and needs adaptation. And it might be easier to adapt tests for working with ReusableConnectionPool than converting ReusableConnectionPool to ActualDbPool

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type of build_db_pool_for_tests may be changed. Also, a DbPool variant may be added if needed.

Comment thread crates/api/api_utils/src/context.rs Outdated
#[derive(Clone)]
pub struct LemmyContext {
pool: ActualDbPool,
pool: ContextPool,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the moment which currently blocks me, and i think it's better to consult with you again. LemmyContext structure must be cloneable, therefore all the fields, therefore the pool. But unfortunately, reusable pool from db-pool crate is not, and i don't have access to its fields to implement the trait here.
But before asking db-pool developer, i'd like to be sure we really need this pool cloning stuff, especially for the tests. Cloning the pool seems a bit strange to me, but i may miss something. I'm looking at the code now, but maybe you folks already have some insights on this

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap it in Arc for now.

@momentary-lapse
Copy link
Copy Markdown
Contributor Author

Update: I'm working on the topic; cannot devote much time for it, but it slowly going forward, and i keep the code in the branch up-to-date. I connected db-pool crate to our tests, and reworked most of them. Currently have a runtime error, planning to look at it and fix this week.
After this, what is left is to change a few tests which are using build_db_pool function.

@Nutomic
Copy link
Copy Markdown
Member

Nutomic commented Mar 9, 2026

The tests are all passing locally for me (although there are many marked as ignore). Makes me wonder why its failing only in CI. From what I can see both .woodpecker.yml and test.sh are unchanged, so they should run tests in the exact same way.

The error shown in CI is "migrations must be managed using lemmy_server instead of diesel CLI". Maybe its because diesel-cli is somehow used by the db-pool crate. I would try to deleted migrations/2025-08-01-000017_forbid_diesel_cli/ and see what happens.

let pool = &mut pool.into();
let pool_arc = data.pool();
let pool_ref = &***pool_arc;
let pool = &mut pool_ref.into();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldnt be necessary.

@momentary-lapse
Copy link
Copy Markdown
Contributor Author

The tests are all passing locally for me (although there are many marked as ignore). Makes me wonder why its failing only in CI. From what I can see both .woodpecker.yml and test.sh are unchanged, so they should run tests in the exact same way.

Yeah, I temporarily ignored the tests which fail on my local machine. But some are still failing on a pipeline, probably because concurrency related issues.

Trying to find out why the tests are not running in parallel, these might be related issues. I did some parallel tests on db-pool library in isolation, and it works perfectly. So the problem is most definitely in pool configuration for lemmy. I now (at least this week) have capacity to work on the issue more consistently, so I hope my investigations will bear some results

@dessalines
Copy link
Copy Markdown
Member

The error shown in CI is "migrations must be managed using lemmy_server instead of diesel CLI".

There must be some mis-merge from main somewhere.

@momentary-lapse
Copy link
Copy Markdown
Contributor Author

Okay, for now i simply ignored cleanup errors in db-pool, and tests are passing now. Tried to disable migrations/2025-08-01-000017_forbid_diesel_cli/ as Nutomic said, and it defeated that diesel cli error, but another one appeared: it tried to truncate table deps_saved_ddl, but failed. It's in util schema, and maybe there's something with permissions. Anyway, planning to look into it more carefully, since now there's no guarantee that databases are even properly cleaned. But at least the global problem is located.

There's a small speed improvement comparing to main build (~4 min). But i suspect that's because one heavy test is still ignored (lemmy_diesel_utils schema_setup::tests::test_schema_setup) and it feels something is hindering tests from running truly in parallel. Even tho now they definitely interfere: i put serial back to worker tests in lemmy_apub_send, because without that directive they all tried to use the same port 8085, and failed

@Nutomic
Copy link
Copy Markdown
Member

Nutomic commented Mar 10, 2026

There's a small speed improvement comparing to main build (~4 min). But i suspect that's because one heavy test is still ignored (lemmy_diesel_utils schema_setup::tests::test_schema_setup) and it feels something is hindering tests from running truly in parallel.

How many tests are currently running at the same time? Not sure where that is defined, but try to change the number higher or lower and see if it helps.

Even tho now they definitely interfere: i put serial back to worker tests in lemmy_apub_send, because without that directive they all tried to use the same port 8085, and failed

It might be possible to pass the port number as parameter to make them parallel. But there are only a few tests in that crate so it can be changed later in another PR.

@momentary-lapse
Copy link
Copy Markdown
Contributor Author

momentary-lapse commented Mar 10, 2026

How many tests are currently running at the same time? Not sure where that is defined, but try to change the number higher or lower and see if it helps.

I hardcoded the pool size for tests to 30 for now. But both pool sizes (restricted and privileged) don't seem to actually affect the test speed at all.

It might be possible to pass the port number as parameter to make them parallel. But there are only a few tests in that crate so it can be changed later in another PR.

Honestly, I tried to set up each test to its own port (changed it here and here) and run them together, but they started failing and hanging. I spent some time, but then switched to other stuff.

@dessalines
Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants