feat: Introduce testing.constructors module and a pytest plugin#3552
feat: Introduce testing.constructors module and a pytest plugin#3552FBruzzesi wants to merge 74 commits into
testing.constructors module and a pytest plugin#3552Conversation
|
Just a quick one from me RE:
Is it possible to add something like this, and keep the imports in place (temporarily)?: # tests.utils.py
from narwhals.testing.typing import Constructor as Constructor , ConstructorEager as ConstructorEagerI really REALLY want this PR, but the diff 😳 (yes yes, I know I'm guilty of this too 😉) |
|
@dangotbanned reverted such commit 😉 Coverage is not picked up in CI though 🥹 |
camriddell
left a comment
There was a problem hiding this comment.
Overall, I'm much happier with this version as you eliminated the reliance of string checking for class attributes that may/may not exist and am implicit registration mechanism.
A few minor comments/food-for-thoughts
| is_nullable=is_nullable, | ||
| needs_gpu=needs_gpu, | ||
| ) | ||
| cls._registry[name] = inst |
There was a problem hiding this comment.
Any preference to do something if the key is already set?
- Raise an error
- emit a warning
- let the user replace whatever was there before (current)
I think the current is fine, just wanted to see if you had thoughts here.
There was a problem hiding this comment.
I think emitting a warning would be ok
| cls.needs_gpu = needs_gpu | ||
|
|
||
| if "name" not in cls.__dict__: | ||
| return |
There was a problem hiding this comment.
Ah, the implementation I saw was different from what you had linked to! There are some differences, but to your point we're on the path to an improvement :)
Happy to have a chat about the design on another channel (Discord?) or perhaps the next iteration of the test suite! 😆
| implementation=Implementation.PANDAS, | ||
| requirements=("pandas", "pyarrow"), | ||
| is_eager=True, | ||
| ) |
There was a problem hiding this comment.
If duplication is a concern, we can have a secondary mechanism register_with. I'm okay with this volume of duplication so will leave this decision to you :)
class frame_constructor:
...
def register_with(self, name, ...):
def dec(func):
... # register and instantiate
from copy import replace # I don't think this is supported by all of our python versions :(
return replace(self, name=name, func=func, ...)
return dec|
@FBruzzesi I still need to look at the rest of the suite (so far only dug into |
for more information, see https://pre-commit.ci
|
Hey everyone @dangotbanned @camriddell @MarcoGorelli @EdAbati 👋🏼 I know the diff is quite large - I hope it's manageable, but please let me know how I can make it easier to review (I am going to update the current description soon). Unlike other past cases, I don't see how this could be split down into multiple PRs. For context, there are a few followups to fully dogfooding on this PR, e.g. feat/testing-constructors-xfail, yet I am not planning to work on it before this one is at least somehow validated |
|
I'd be very happy to take a look and thank you again for working on this! Today I'm busy but have some time tomorrow :) |
One thing that could make it easier is if the description was interspersed with examples. E.g.
|
Description
TL;DR
tests/conftest.pyinto a first class, documented module (narwhals.testing.constructors) and ships it as an installable pytest plugin (registered via thepytest11entry point). Downstream libraries that build on Narwhals can get parametrised dataframe/lazyframe fixtures, without copy pasting our conftest.pytest-covin favour of acoverage run -> combine -> reportflow), because registering an entry point plugin changes import timing in a way that breaks coverage measurement.Motivation
constructor,constructor_eager, the--constructorsflag) were Narwhals-internal, undocumented, and trapped insidetests/conftest.py. Downstream projects that wanted to test against the same backend matrix had to vendor our conftest.testingmodule to enable downstream libraries testing #739 * not marking it as closing, since the goal is to fully dogfood the plugin in our own suite first.__name__sniffing ("pandas" in str(constructor)) for backend detection. Moving to a typedframe_constructorobject with explicit metadata (implementation, requirements, eager/lazy, nullability, GPU) makes that introspection first class.What changed
1. New module:
narwhals/testing/constructors.pyThe core abstraction is
frame_constructor, a callable wrapper around a backend builder function.dictinto a native frame.implementation,requirements,is_eager,is_nullable,needs_gpu.@frame_constructor.register(...)decorator, which instantiates the wrapper and stores it in a shared class level_registry.is_*properties delegate toImplementation(is_pandas,is_polars,is_pandas_like,is_spark_like,is_duckdb,is_lazy,needs_pyarrow,is_available, etc.), replacing the old string sniffing.__call__(obj, namespace, **kwds)builds the native frame and then wraps it vianamespace.from_native(...), so the fixture returns a Narwhals frame directly.Module level helpers (all part of
__all__):available_backends()/available_cpu_backends(): names whose libraries are importable.get_backend_constructor(name): typed lookup, raisesValueErrorwith the valid set on a bad name.prepare_backends(include=..., exclude=...): filtered, sorted selection (excludetakes precedence).is_backend_available(*packages): thinimportlib.util.find_speccheck.pyspark_session()/sqlframe_session(): session factories (moved out oftests/utils.py).DEFAULT_BACKENDS: the historical default subset (pandas,pandas[pyarrow],polars[eager],pyarrow,duckdb,sqlframe,ibis).2. New module:
narwhals/testing/pytest_plugin.pyThe installable pytest plugin, wired up in
pyproject.toml:exposes
Fixtures (auto-parametrised against the selected, installed backends):
nw_framenw_dataframenw_lazyframenw_pandas_like_frameCLI options (under a
narwhalsoption group):--nw-backends=pandas,polars[lazy],duckdb: comma separated selection. Defaults toDEFAULT_BACKENDSintersected with what is installed.--all-nw-backends: every installed CPU backend (overrides--nw-backends; excludesmodinandpyspark[connect]).--use-external-nw-backend: escape hatch so a downstream suite can register its own parametrisation; the plugin still adds the options but stops parametrising the fixtures.NARWHALS_DEFAULT_BACKENDSoverrides the default list (useful e.g. undercudf.pandas).3. Supporting type aliases:
narwhals/testing/typing.pyPublic aliases for downstream type hints:
Data(the column oriented input mapping),FrameConstructor,DataFrameConstructor,LazyFrameConstructor, and theNarwhalsNamespaceprotocol (anything exposingfrom_native).4. Test suite migration
The test suite partially moved off the old pattern to keep the changes more contained.
constructor,constructor_eager, andconstructor_pandas_likeare kept as aliases over the newnw_*fixtures, wrapped in_PatchedFrameConstructor/_PatchedDataFrameConstructorproxies that defaultnamespacetonarwhals. This lets the legacy call sites keep working during the transition. ATODOmarks their eventual removal.constructor(data)instead ofnw.from_native(constructor(data))), call sites simplified. Example fromjoin_test.py: thefrom_native_lazy(constructor(df))helper collapsed toconstructor(df).lazy().tests/utils.py: theConstructor/ConstructorEager/ConstructorPandasLiketype aliases are now re-exported from the conftest proxies; thepyspark_session/sqlframe_sessionhelpers moved intonarwhals.testing.constructors.uses_pyarrow_backendnow keys offstr(constructor)instead of__name__.tests/testing/constructors_test.py(registry, properties,prepare_backends, equality/hash) andtests/testing/plugin_test.py(uses pytest'spytesterto assert the plugin parametrises eager/lazy fixtures and that--use-external-nw-backenddisables parametrisation).5. Coverage and CI rework
Registering the entry point plugin forces
narwhals/__init__.pyto import early (beforepytest-covinstalls its tracer), which silently dropped coverage. Fix:pytest-covdependency.coverage run -m pytest ... && coverage combine && coverage report, so the tracer starts at interpreter startup before any import.execv/forkpatches in[tool.coverage.run].execvandforkare unsupported on Windows (coverage raises), so Windows CI jobs collapse those tosubprocessvia theCOVERAGE_PATCH_EXECV/COVERAGE_PATCH_FORKenv vars; coverage dedupes the final list.COVERAGE_PROCESS_START=pyproject.tomlis set across the workflows.Tooling changes:
run-ci-coverageMake target (run -> combine -> report, with optionalCOV_SOURCEto scope coverage, e.g.src/narwhals/_spark_like).run-cistays for non coverage runs (doctests, narrow-dep, tpch, ibis, modin).--constructorsto--nw-backendsand--all-cpu-constructorsto--all-nw-backends.utils/import_check.py: atestingallowlist permits the deliberate lazy imports of every backend inside the constructors.utils/sort_api_reference.py:testingadded to the skip set (its API doc is hand ordered).docs/api-reference/testing.mddocumenting fixtures, options, and a quick start.CONTRIBUTING.mdupdated for the renamed flags.Public API surface
Newly importable and documented:
narwhals.testing.frame_constructor(and theconstructorsmodule helpers above).narwhals.testing.typingaliases.nw_frame,nw_dataframe,nw_lazyframe,nw_pandas_like_frameand the--nw-*options.Quick start (downstream usage)
pytest --nw-backends="pandas,polars[lazy]" pytest --all-nw-backendsOut of scope
nw_series_constructorfixture. Ergonomically nice, but a separate feature request.constructor/constructor_eager/constructor_pandas_likecompatibility shims. Tracked by aTODO; they stay until every test calls thenw_*fixtures directly.Old description
This PR introduces two new features:
tests/conftest.pyintonarwhals/testing/constructorsnw.from_nativeright after.nw.from_nativekwargs -> For this it's enough to provide it as different proper arguments in__call__, e.g.nw_kwargs,backend_kwargs.Version.Xor the namespace{nw, nw_v1, nw_v2}? If we do so, I would argue that we must enforce it and never assume for the user. In my opinionnw_frame_constructor(data, version=nw)would still look better thannw.from_native(nw_frame_constructor(data)).Update: feat: Let constructor fixtures return narwhals object directly #3569
constructorandconstructor_eageris not really intuitive. I am happy to brainstorm better names for downstream users. And similarly the flags--all-cpu-constructors,--constructors. (*)Update: refactor: Rename exposed fixtures and pytest options #3556
nw_lazy_constructorfor "free"nw_series_constructor). I know it's possible to make a dataframe and then get a column. But it would definitely improve users ergonomics. However, that's feature request/improvement and out of scope of this PR(*) Renamed in #3556 and #3569
nw_frame,nw_dataframe,nw_lazyframe--nw-all-backends(misleading that has no CPU in it?),--nw-backends=.... I think the term constructor might be be too vague, and there is no other case in the codebase. While we have a bunch of cases in which we allow to pass a backendpyest-cov changes
The
[project.entry-points.pytest11]registers the pytest plugin which we dogfood in the testsuite here. Whenpyteststarts, it loads all entry-point plugins beforepytest-covinstalls its coverage tracer.Importing
narwhals.testing.pytest_pluginforces Python to loadnarwhals/__init__.py(to resolve the package), which eagerly imports the entire public API.On main, no entry point existed, so no narwhals code loaded before coverage started.
The fix:
pytest-covdependencypyproject.tomlto track subprocess, execv and fork (the latter two are not available on windows, hence multiple hacks are performed in GithubAction's to dynamically populate the fields inpyproject.toml)coverage run -m pytest ...,coverage combineandcoverage report, so that coverage run starts the tracer at Python startup before any importsThanks to @EdAbati and @dangotbanned - I mostly adjusted from your work, not much more than that!
What type of PR is this? (check all applicable)
Related issues
testingmodule to enable downstream libraries testing #739 (not marking as closing, since we should fully dogfood in the test suite)Checklist