-
Notifications
You must be signed in to change notification settings - Fork 147
refactor: Simplify Implementation.to_native_namespace
#2620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The phrasing of this tickled me 😂 @FBruzzesi what if we solved this and closed a related issue in one go? We'd use what you have as We'd also be able to get typing working (at least for the individual cases) |
I personally don't mind but who knows you guys don't have strong opinions on these hacks 😂
Sure, the polars implementation seems easy to adapt as well. What I am not too keen on is that
To my knowledge, @overload
def import_optional(module_name: Literal["pandas"]) -> ?
return __import__(module_name) |
At the end of (#1940 (comment)) ... So you leave each Then you can have a mapping like: # or replace `Implementation` with the literal you resolve in `Implementation.value`
_IMPORT_FUNCTIONS: Mapping[Implementation, Callable[..., ModuleType]] = {
Implementation.PANDAS: import_pandas,
Implementation.POLARS: import_polars,
...: ...,
}
So in isolation we get the typing ( |
Sibling to `no_imports_test`
All usage had these: > error: Call to untyped function "import_pyarrow" in typed context [no-untyped-call]
Turns out `find_spec` raises if the parent is missing
…b.com/narwhals-dev/narwhals into refactor/simplify-to-native-namespace
…b.com/narwhals-dev/narwhals into refactor/simplify-to-native-namespace
The only way I've found to fix this was by removing the https://github.com/narwhals-dev/narwhals/actions/runs/15491689078/job/43618476234?pr=2620 |
@MarcoGorelli @dangotbanned I tried to implement the idea I had mentioned in discord in 96540c3, yet apparently everything breaks apart with ibis, so I am happy to revert unless you guys have some other ideas. The reason to use
|
@FBruzzesi here's a platter of thoughts 😄 This PR
|
def _backend_version(self) -> tuple[int, ...]: | |
native = self.to_native_namespace() | |
into_version: Any | |
if self not in { | |
Implementation.PYSPARK, | |
Implementation.PYSPARK_CONNECT, | |
Implementation.DASK, | |
Implementation.SQLFRAME, | |
}: | |
into_version = native | |
elif self in {Implementation.PYSPARK, Implementation.PYSPARK_CONNECT}: | |
into_version = get_pyspark() # pragma: no cover | |
elif self is Implementation.DASK: | |
into_version = get_dask() | |
else: | |
import sqlframe._version | |
into_version = sqlframe._version | |
return parse_version(into_version) |
Minimum versions (generally)
I totally forgot to follow up on (https://discord.com/channels/1235257048170762310/1276640597284884490/1379470754051002389)
I'll continue this over on that other issue so you've got something to come back to
We have two other methods for enforcing version constraints:
utils.validate_backend_version
- For the absolute minimum
@requires.backend_version
- For method-specific support
If there is another use-case these don't cover, I think it would make the most sense to improve those tools.
This PR would be adding a third way of handling package versions - leading to 3 different error messages for the same issue 😔
Thanks @dangotbanned, I am aware of all the code parts you mentioned. I think
Yes, but it requires and implementation & the version already
We can abstract the body of this I guess we can proceed with one of the two following options:
in this way we could be able to re-use the above mentioned functions/methods |
@dangotbanned I am quite happy with the state of this now. Let me know your thoughts whenever you have the time 🤞🏼 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @FBruzzesi sorry this one took a little longer than usual
narwhals/_utils.py
Outdated
from importlib.metadata import version | ||
|
||
into_version = sqlframe._version | ||
return parse_version(into_version) | ||
distribution_name = _IMPLEMENTATION_TO_DISTRIBUTION_NAME.get(self, self.value) | ||
return parse_version(version=version(distribution_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might end up triggering this downstream issue again with importlib.metadata.version
:
- perf: Avoid module-level
importlib.util.find_spec
#2391 (comment) - No package metadata was found for typing_extensions on plotly import pyinstaller/pyinstaller-hooks-contrib#902 (comment)
- pyinstaller/pyinstaller-hooks-contrib@18caba0
It does have a solution, but I guess we'd need to keep the metadata requirements in sync somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can revert to use __version__
attribute where applicable, _version
for sqlframe, and from importlib.metadata import version
only for ibis (I couldn't spot an attribute storing the library version)
As we discussed in the past, __version__
is not a standard, so not guaranteed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed in the past,
__version__
is not a standard, so not guaranteed
Yeah that's a good point, personally I'd want use to be using importlib.metadata.version
for that reason.
I wonder if the pyinstaller
folks could collect metadata for all dependencies in our pyproject.toml
?
It does seem a bit off to me that we'd need to do things in a non-standardized way for what I can only assume is a downstream perf optimization 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main "issue" is that it would require to import the module in the first place, which is somewhat a looping with to_native_namespace
🤔
An idea would be to move the following block into _backend_version
:
if self is Implementation.UNKNOWN:
msg = "Cannot return native namespace from UNKNOWN Implementation"
raise AssertionError(msg)
from importlib import import_module
module_name = _IMPLEMENTATION_TO_MODULE_NAME.get(self, self.value)
module = import_module(module_name)
and cache the module once loaded to be reused in to_native_namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dangotbanned is from importlib import import_module
ok to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and cache the module once loaded to be reused in
to_native_namespace
I'm definitely in favor of adding some caching into the mix somewhere (#1657 (reply in thread)), we're checking backend_version
waaaaaaay too many times for something that doesn't change.
But I think we can look at that in another issue.
and
from importlib.metadata import version
only for ibis (I couldn't spot an attribute storing the library version)
ibis
seems to have always had a __version__
?
ibis==1.0.0
https://github.com/ibis-project/ibis/blob/33201ac9cdbdafde0ae0dd99960657bc827f056a/ibis/__init__.py#L138ibis==6.0.0
https://github.com/ibis-project/ibis/blob/97173935961231bac391943ba56ab499f4f9c2fe/ibis/__init__.py#L4ibis==10.5.0
https://github.com/ibis-project/ibis/blob/cfa5768b6fb6020f712aadc842c9c35966e2acac/ibis/__init__.py#L5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dangotbanned is
from importlib import import_module
ok to use?
Yeah should be - you'll get the same ModuleNotFoundError
if it isn't installed
>>> import ibis2
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[32], line 1
----> 1 import ibis2
ModuleNotFoundError: No module named 'ibis2'
from importlib import import_module
>>> import_module("ibis2")
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[31], line 3
1 from importlib import import_module
----> 3 import_module("ibis2")
File ~\AppData\Roaming\uv\python\cpython-3.13.2-windows-x86_64-none\Lib\importlib\__init__.py:88, in import_module(name, package)
86 break
87 level += 1
---> 88 return _bootstrap._gcd_import(name[level:], package, level)
File <frozen importlib._bootstrap>:1387, in _gcd_import(name, package, level)
File <frozen importlib._bootstrap>:1360, in _find_and_load(name, import_)
File <frozen importlib._bootstrap>:1324, in _find_and_load_unlocked(name, import_)
ModuleNotFoundError: No module named 'ibis2'
tests/dependencies/imports_test.py
Outdated
@pytest.mark.parametrize( | ||
("impl", "frame_constructor", "frame_type", "kwargs"), | ||
[ | ||
(Implementation.CUDF, "DataFrame", "DataFrame", {}), | ||
(Implementation.DASK, "from_dict", "DataFrame", {"npartitions": 1}), | ||
(Implementation.MODIN, "DataFrame", "DataFrame", {}), | ||
(Implementation.PANDAS, "DataFrame", "DataFrame", {}), | ||
(Implementation.POLARS, "DataFrame", "DataFrame", {}), | ||
(Implementation.PYARROW, "table", "Table", {}), | ||
], | ||
) | ||
def test_round_trip( | ||
monkeypatch: pytest.MonkeyPatch, | ||
impl: Implementation, | ||
frame_constructor: str, | ||
frame_type: str, | ||
kwargs: dict[str, Any], | ||
) -> None: | ||
module_name = impl.value | ||
if not find_spec(module_name): | ||
reason = f"{module_name} not installed" | ||
pytest.skip(reason=reason) | ||
|
||
if impl.is_pyarrow() or (sys.version_info >= (3, 10) and impl.is_pandas()): | ||
# NOTE: Can't do `monkeypatch.delitem` safely for other implementations | ||
monkeypatch.delitem(sys.modules, module_name) | ||
|
||
module = impl.to_native_namespace() | ||
df = getattr(module, frame_constructor)(data, **kwargs) | ||
result = _roundtrip_query(df) | ||
|
||
assert isinstance(result, getattr(module, frame_type)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous tests, I forgot to add the step that I was eyballing in my IDE - but was actually the motivation for them 🤦♂️
Previous tests
narwhals/tests/dependencies/imports_test.py
Lines 30 to 73 in 214e15a
def test_import_polars(data: dict[str, Any]) -> None: | |
# NOTE: Can't do `monkeypatch.delitem` safely | |
# ImportError: PyO3 modules compiled for CPython 3.8 or older may only be initialized once per interpreter process | |
pytest.importorskip("polars") | |
df = dependencies.import_polars().DataFrame(data) | |
result = _roundtrip_query(df) | |
import polars as pl | |
assert isinstance(result, pl.DataFrame) | |
# NOTE: Don't understand this one | |
@pytest.mark.xfail( | |
reason="'dict' object has no attribute '_meta'.\n" | |
"dask/dataframe/dask_expr/_collection.py:609", | |
raises=AttributeError, | |
) | |
def test_import_dask(data: dict[str, Any]) -> None: | |
pytest.importorskip("dask") | |
df = dependencies.import_dask().DataFrame(data) | |
result = _roundtrip_query(df) | |
import dask.dataframe as dd | |
assert isinstance(result, dd.DataFrame) | |
def test_import_pandas(monkeypatch: pytest.MonkeyPatch, data: dict[str, Any]) -> None: | |
pytest.importorskip("pandas") | |
monkeypatch.delitem(sys.modules, "pandas") | |
df = dependencies.import_pandas().DataFrame(data) | |
result = _roundtrip_query(df) | |
import pandas as pd | |
assert isinstance(result, pd.DataFrame) | |
def test_import_pyarrow(monkeypatch: pytest.MonkeyPatch, data: dict[str, Any]) -> None: | |
pytest.importorskip("pyarrow") | |
monkeypatch.delitem(sys.modules, "pyarrow") | |
df = dependencies.import_pyarrow().table(data) | |
result = _roundtrip_query(df) | |
import pyarrow as pa | |
assert isinstance(result, pa.Table) |
If we used polars
as an example, I was trying to make sure the typing survived a roundtrip.
I could see that out on hover, but the test should've been:
def test_import_polars(data: dict[str, Any]) -> None:
pytest.importorskip("polars")
df = dependencies.import_polars().DataFrame(data)
result = _roundtrip_query(df)
import polars as pl
if TYPE_CHECKING:
from typing_extensions import assert_type
assert_type(result, pl.DataFrame)
assert isinstance(result, pl.DataFrame)
That style is what I wrote in the ...
Namespace
tests
narwhals/tests/namespace_test.py
Lines 120 to 132 in e831be1
def test_namespace_from_numpy_polars() -> None: | |
pytest.importorskip("polars") | |
pytest.importorskip("numpy") | |
import numpy as np | |
import polars as pl | |
arr: _2DArray = cast("_2DArray", np.array([[5, 2, 0, 1], [1, 4, 7, 8], [1, 2, 3, 9]])) | |
columns = "a", "b", "c" | |
frame = Namespace.from_backend("polars").compliant.from_numpy(arr, columns).native | |
if TYPE_CHECKING: | |
assert_type(frame, pl.DataFrame) | |
assert isinstance(frame, pl.DataFrame) |
Since the PR doesn't have typing anymore, I'm not sure the test makes much sense as we're effectively just checking:
- Does
find_spec
work? - Can we construct a dataframe via
import_module
? - Once we have a dataframe, can we use
from_native
andto_native
?
I think the other tests you've added give us the coverage for the new behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you would you be ok with just leave the tests test_to_native_namespace
and test_to_native_namespace_unknown
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah as long as they're all we need for coverage
What's up with pyarrow nightly 🥲 |
Same! I'm seeing that recently in #2572 as well |
Implementation.to_native_namspace
Implementation.to_native_namespace
narwhals/_utils.py
Outdated
|
||
As a biproduct of loading the native namespace, we also store it as an attribute | ||
under the `_native_namespace` name. | ||
""" | ||
if self is Implementation.UNKNOWN: # pragma: no cover | ||
msg = "Cannot return backend version from UNKNOWN Implementation" | ||
raise AssertionError(msg) | ||
|
||
from importlib import import_module | ||
|
||
module_name = _IMPLEMENTATION_TO_MODULE_NAME.get(self, self.value) | ||
self._native_namespace = import_module(module_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to store modules on enums (or any state for that matter)
To be super clear, my comment in (#2620 (comment)) was meant to mean let's defer caching to another issue.
I was just about to merge and was pretty surprised to see this added in a commit titled rollback _backend_verson.
If we need to add caching of module loading, I would much prefer to just lift the _LazyModule
stuff from polars
- since that has typing support built in (https://github.com/pola-rs/polars/blob/7cb263a6dc4e3a1893b6ad4af963733c83268b60/py-polars/polars/dependencies.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to avoid repeating the exact same imports to check the backend version and return the module.
I am a bit in a rush but I just made a commit to factor out and cache in a helper function.
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
Not sure how likable this is (see discord)