Skip to content

fixes leaking datasets tests #2730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Jun 11, 2025
Merged

Conversation

rudolfix
Copy link
Collaborator

@rudolfix rudolfix commented Jun 7, 2025

Description

  1. stores references to all active pipelines (on demand) and drops them after each test
  2. properly mocks local_dir in tests so local files are automatically in _storage (no more *duckdb) databases after tests
  3. removed concurrent access to resolved config traces (parallel tests were sometimes failing)
  4. improves the code that opens duckdb and moterduck connection (opened connections were leaked when setting configs on connection failed etc.)
  5. finally added a full set of settings for duckdb connection, added a way to move connection to ibis with settings applied

more in commits

Copy link

netlify bot commented Jun 7, 2025

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 62dfaa7
🔍 Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/684985508ded210008b044fe
😎 Deploy Preview https://deploy-preview-2730--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@rudolfix rudolfix force-pushed the chores/fixes-leaking-datasets-tests branch from 549c5bb to 19b1139 Compare June 8, 2025 00:04
@rudolfix rudolfix marked this pull request as ready for review June 9, 2025 16:34
@rudolfix rudolfix requested a review from sh-rp June 9, 2025 16:34
@rudolfix rudolfix mentioned this pull request Jun 10, 2025
@@ -92,14 +94,12 @@ def open_connection(self) -> duckdb.DuckDBPyConnection:
if first_connection:
# TODO: we need to frontload the httpfs extension for abfss for some reason
if self.is_abfss:
self._conn.sql("INSTALL https; LOAD httpfs;")
self._conn.sql("INSTALL httpfs; LOAD httpfs;")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they were aliases it seems!

@@ -167,15 +167,50 @@ def test_storage() -> FileStorage:


@pytest.fixture(autouse=True)
def autouse_test_storage() -> FileStorage:
return clean_test_storage()
def autouse_test_storage(request) -> Optional[FileStorage]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is maybe out of scope, but all of these fixtures need good names and docstrings and should be added to the docstring linter, I often look up what they do exactly in code if I need to disable something for testing. This one is a good example, if you never looked it up the name autouse_test_storage gives no real hint about what it does.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right! we'll dedicate a week to refactor our utils and release them as OSS lib. version in dlt+ has nice docstrings btw.

if "no_load" in request.keywords:
# always deactivate
Container()[PipelineContext].deactivate()
Container()[PipelineContext].clear_activation_history()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, maybe we want to keep the activation history here so we can clean up with drop_active_pipeline_data after a bunch of "no_load" tests have run without having to store the pipelines?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because most of no_load tests produce empty or fake pipelines ie. with destinations that are not instantiated. so they were leaking into subsequent tests and breaking them on cleanup. I'll keep it util we have a good case

@dataclass
class WithLocalFiles(BaseConfiguration):
"""Mixin to BaseConfiguration that shifts relative locations into `local_dir` and allows for a few special locations.
:pipeline: in the pipeline working folder
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was already like this, but why not "pipeline_working_dir"? This way it is clear what this variable means.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm you can add QoL ticket for that. I like :pipeline: because it is like :memory:

@rudolfix rudolfix merged commit f821d21 into devel Jun 11, 2025
53 of 55 checks passed
@rudolfix rudolfix deleted the chores/fixes-leaking-datasets-tests branch June 11, 2025 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants