Thoughts on monorepos (dependency isolation) #4147
Replies: 11 comments 12 replies
-
One request that comes up from time to time is "1 Kedro project, different groups of dependencies". At the moment, 1 Kedro project = 1 Python package, therefore having completely disjoint dependencies is not possible. There are several possible solutions:
|
Beta Was this translation helpful? Give feedback.
-
Beyond the "different groups of dependencies" use case, today a user asked about having different packages under
I tried this with both setuptools and PDM build backends and it worked like a charm (hatchling on the other hand chocked). Sample
However, I'm sure Kedro has some hardcoded logic that looks into a specific path inside |
Beta Was this translation helpful? Give feedback.
-
To note, PEP 735 (Dependency Groups) is now accepted. https://peps.python.org/pep-0735/ |
Beta Was this translation helpful? Give feedback.
-
Useful ecosystem overview of how other languages do this https://gist.github.com/konstin/6d04f111563641beb10facb617fe0eb3 |
Beta Was this translation helpful? Give feedback.
-
Yet another user concerned about the 1 Kedro pipeline = 1 set of requirements https://kedro.hall.community/evaluating-kedro-for-data-engineering-processes-qFYCwWk5VKQh |
Beta Was this translation helpful? Give feedback.
-
Originally posted by @deepyaman in #4319 (comment) |
Beta Was this translation helpful? Give feedback.
-
@datajoely says:
|
Beta Was this translation helpful? Give feedback.
-
crazy idea for the "conflicting dependencies" problem: a modified # pipeline_registry.py
# from kedro.framework.project import find_pipelines
from kedro_monorepo.util import find_pipelines # <-------------
...
def register_pipelines() -> dict[str, Pipeline]:
# Unchanged
pipelines = find_pipelines()
pipelines["__default__"] = sum(pipelines.values())
return pipelines but this The only missing bit would be having some tooling to make this easier, like |
Beta Was this translation helpful? Give feedback.
-
uv solves the dependency isolation issue for Bruin by just running everything in isolated, ephemeral environments https://www.linkedin.com/posts/burakkarakan_theres-no-other-tool-in-the-market-that-activity-7325440443893092353-rbFu?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAXcavUB6j26_YjUtYinmzmdjTchguHHuG4 |
Beta Was this translation helpful? Give feedback.
-
Hi @astrojuanlu, thanks for the insights above. I have a Kedro project where different pipelines (or parts of them) run on heterogeneous infrastructure — some on AzureML with GPU/CUDA support, others on CPU-only CI/VMs. Naturally, some parts of the project depend on GPU libraries like torch, which aren’t available or necessary in all environments. I want to avoid import-time errors/warnings when these libraries aren’t installed or CUDA devices aren’t available. To work around this, I’m using a pattern like: def gpu_check():
if torch.cuda.is_available():
import torch Would you consider this an acceptable practice within Kedro's design? Also, would you recommend managing these environment-specific dependencies via [project.optional-dependencies] in pyproject.toml (e.g., gpu, cpu, etc.), or is it better to go further and split into separate Kedro projects/packages if things get more complex? I want to keep a clean, maintainable structure that avoids these import-time issues across environments. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Another problem worth looking at while solving this is how to help teams separate the business logic from the Kedro logic even more. Some users put Kedro logic inside their node functions because they want to do dynamic or unconventional stuff. However, this makes their code more Kedro-dependent and less reusable. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
With the deprecation of micropackaging #3854, it would be good to explore monorepo approaches. What "monorepo" actually means depends on the use case.
Beta Was this translation helpful? Give feedback.
All reactions