[WIP] AIPMDM-888: Simplify OSS Metaflow core tests to be more Pythonic / pytest-friendly / tox-friendly#3151
[WIP] AIPMDM-888: Simplify OSS Metaflow core tests to be more Pythonic / pytest-friendly / tox-friendly#3151Tingting-Chang wants to merge 11 commits intomasterfrom
Conversation
Greptile SummaryThis is a large, well-executed refactoring that replaces a 643-line custom test runner with standard pytest/tox tooling. All P0/P1 issues from previous review rounds are correctly addressed: Confidence Score: 5/5Safe to merge; all P0/P1 issues from previous review rounds are addressed and remaining findings are P2 style suggestions only. No new P0 or P1 issues found. The three comments are all P2: tempdir cleanup on failure (usability for debugging), sys.path import ordering (bare-pytest ergonomics), and tox deps duplication. These do not affect correctness or CI reliability. test/core/conftest.py (sys.path ordering before local imports), test/core/test_core_pytest.py (tempdir always deleted including on failure) Important Files Changed
Reviews (10): Last reviewed commit: "Passing AnonymousCredentials()" | Re-trigger Greptile |
b4d9457 to
8dc62b3
Compare
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
PR Type
Summary
Issue
Implements AIPMDM-888: refactor test/core/ to be compatible with standard Python tooling (pytest, tox) so contributors can run tests without understanding the custom orchestration layer.
R1: Replace
contexts.jsonwith tox environments and pytest fixturesThis change removes
contexts.json, which duplicated an environment matrix that tox already supports natively.test/core/tox.iniis added as the source of truth for core test environments. It defines onetestenv:core-*per infrastructure backend: local, GCS, Azure, Batch, K8s, Argo, and SFN.setenv, including Metaflow configuration (datastore, metadata service, credentials) andMETAFLOW_CORE_*control variables (marker, top-level options, executors, disabled tests).{[testenv]setenv}inheritance, with a_disabledsection for common disabled-test lists.test/core/conftest.pynow reads all context fromos.environand usespytest_generate_teststo parametrize(graph, test, executor)combinations. No Python context file is imported anywhere.METAFLOW_CORE_CHECKSenvironment variable into a proper session-scopedcore_checksfixture, which can now be overridden per directory without touching tox.tox.inino longer containscore-*envs and now points users totest/core/tox.ini.R2: Eliminate the custom test runner
run_tests.py(643 lines) is deleted. Test execution is now handled entirely by pytest.test/core/conftest.pynow defines_iter_graphs()and_iter_tests()directly instead of importing them fromrun_tests.py.test/core/test_core_pytest.pynow defines_run_flow()directly in place ofrun_test(). It supportscli,api, and scheduler executors.This also fixes two existing bugs:
apiexecutor now catchesRuntimeErrorfromRunner.run()and converts it into a non-zero return code instead of surfacing an unhandled exceptionFileNotFoundErrorfromopen("run-id")R3: Convert test flows to standard pytest tests
The core test suite now behaves like normal pytest code instead of relying on a subprocess-heavy custom harness.
MetaflowTesthas been renamed toFlowDefinitionacross all 64 test classes and inmetaflow_test/__init__.py.Testsuffix is removed because these classes are flow templates combined with graph topologies byFlowFormatter, not pytest test cases.MetaflowTest = FlowDefinitionalias is kept for external compatibility._run_flow()dynamically imports the generatedtest_flow.py, instantiates the flow class, and callsformatter.test.check_results(flow, checker)directly.AssertionErrors with full pytest tracebacks instead of opaque subprocess exit codes.FlowFormatter._check_lines()andcheck_codeare removed.MetaflowCheckno longer depends onsys.argv:run_idandcli_optionsare now explicit constructor parameters.new_checkernow accepts either a checker class or a checker class name.assert_equals(a, b)calls across 54 files are replaced with plainassert a == b, enabling pytest assertion rewriting and better failure output.assert_exception(lambda: f(), E)are replaced withpytest.raises(E)intag_mutation,merge_artifacts,merge_artifacts_include, andmetadata_check.assert_equals_metadatais removed and replaced with inline assertions inresume_end_step.py.R4: Simplify test utilities
Test helpers are reduced to standard pytest patterns wherever possible.
assert_equals,assert_exception, andassert_equals_metadataare removed frommetaflow_test/__init__.py.ExpectationFailed,AssertArtifactFailed,AssertLogFailed, andAssertCardFailednow subclassAssertionError, so pytest reports them natively.assert_artifact,assert_log, andassert_cardare rewritten to use plainassertinternally rather than manually raising custom exceptions.artifact(step, name)is added to bothCliCheckandMetadataCheck, returning{task_id: value}so tests can make direct assertions such asassert checker.artifact(step, "data") == {"task1": "abc"}.test/core/pytest.iniis added to centralize pytest configuration, includingnorecursedirs,timeout = 1800,addopts = -v --tb=short, and the seven backend markers. Tox command lines now only need to pass the marker flag and parallelism settings.R5: tox is now the orchestration layer
Core test environments can now be run directly with tox, without any custom orchestration layer:
GCS emulator support
This PR also adds first-class support for running against a local GCS emulator.
Test Plan
Runtime:
Commands to run:
# paste exact commandsWhere evidence shows up:
Before (error / log snippet)
After (evidence that fix works)
Root Cause
Why This Fix Is Correct
Failure Modes Considered
Tests
Non-Goals
AI Tool Usage