Skip to content

[dagster-airlift][datasets 3/3] dataset filtering #29164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: dpeng817/dataset_assets
Choose a base branch
from

Conversation

dpeng817
Copy link
Contributor

@dpeng817 dpeng817 commented Apr 10, 2025

Summary & Motivation

Adds filtering properties for datasets.

  • completely turn off retrieval of datasets
  • only retrieve dataset information for producing dags that match the rest of the filtering.
  • allow filtering on uri pattern.

How I Tested These Changes

Additional tests.

Changelog

  • [dagster-airlift] The AirflowFilter API has been updated with options to limit the retrieval of datasets from the rest API.

Comment on lines +192 to +195
if (
retrieval_filter.dataset_uri_ilike
and retrieval_filter.dataset_uri_ilike not in dataset.uri
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation checks for substring presence with retrieval_filter.dataset_uri_ilike not in dataset.uri, which doesn't accurately simulate SQL's ILIKE operator behavior. For better fidelity with the real implementation, consider using case-insensitive pattern matching (e.g., with regex) or at minimum a case-insensitive substring check. This would more closely match how ILIKE works in the actual SQL queries used by the production code.

Suggested change
if (
retrieval_filter.dataset_uri_ilike
and retrieval_filter.dataset_uri_ilike not in dataset.uri
):
if (
retrieval_filter.dataset_uri_ilike
and retrieval_filter.dataset_uri_ilike.lower() not in dataset.uri.lower()
):

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

) -> Sequence["Dataset"]:
params: dict[str, Any] = {"limit": limit, "offset": offset}
if retrieval_filter.dataset_uri_ilike:
params["uri_pattern"] = retrieval_filter.dataset_uri_ilike
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer to match the python name to the REST API name, e.g. just rename dataset_uri_ilike to dataset_uri_pattern.

@dpeng817 dpeng817 force-pushed the dpeng817/dataset_assets branch from 339f24e to de133c2 Compare April 23, 2025 16:51
@dpeng817 dpeng817 force-pushed the dpeng817/dataset_filtering branch 2 times, most recently from ae9608c to 730f322 Compare April 23, 2025 21:59
@dpeng817 dpeng817 force-pushed the dpeng817/dataset_assets branch from de133c2 to d67e1e0 Compare April 24, 2025 23:46
@dpeng817 dpeng817 force-pushed the dpeng817/dataset_filtering branch from 730f322 to ccef52b Compare April 24, 2025 23:47
@dpeng817 dpeng817 force-pushed the dpeng817/dataset_assets branch from d67e1e0 to 2b29b96 Compare April 25, 2025 00:29
@dpeng817 dpeng817 force-pushed the dpeng817/dataset_filtering branch from ccef52b to fb6bb06 Compare April 25, 2025 00:29
This was referenced Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants