Skip to content

[WIP] AIPMDM-888: Simplify OSS Metaflow core tests to be more Pythonic / pytest-friendly / tox-friendly#3151

Open
Tingting-Chang wants to merge 11 commits intomasterfrom
AIPMDM-888
Open

[WIP] AIPMDM-888: Simplify OSS Metaflow core tests to be more Pythonic / pytest-friendly / tox-friendly#3151
Tingting-Chang wants to merge 11 commits intomasterfrom
AIPMDM-888

Conversation

@Tingting-Chang
Copy link
Copy Markdown

@Tingting-Chang Tingting-Chang commented Apr 27, 2026

PR Type

  • Bug fix
  • New feature
  • Core Runtime change (higher bar -- see CONTRIBUTING.md)
  • Docs / tooling
  • Refactoring following JIRA

Summary

Issue

Implements AIPMDM-888: refactor test/core/ to be compatible with standard Python tooling (pytest, tox) so contributors can run tests without understanding the custom orchestration layer.

R1: Replace contexts.json with tox environments and pytest fixtures

This change removes contexts.json, which duplicated an environment matrix that tox already supports natively.

  • test/core/tox.ini is added as the source of truth for core test environments. It defines one testenv:core-* per infrastructure backend: local, GCS, Azure, Batch, K8s, Argo, and SFN.
  • Each tox env now defines its test context through setenv, including Metaflow configuration (datastore, metadata service, credentials) and METAFLOW_CORE_* control variables (marker, top-level options, executors, disabled tests).
  • Shared settings are deduplicated through {[testenv]setenv} inheritance, with a _disabled section for common disabled-test lists.
  • test/core/conftest.py now reads all context from os.environ and uses pytest_generate_tests to parametrize (graph, test, executor) combinations. No Python context file is imported anywhere.
  • Checker selection has moved from the METAFLOW_CORE_CHECKS environment variable into a proper session-scoped core_checks fixture, which can now be overridden per directory without touching tox.
  • The root tox.ini no longer contains core-* envs and now points users to test/core/tox.ini.

R2: Eliminate the custom test runner

  • run_tests.py (643 lines) is deleted. Test execution is now handled entirely by pytest.

  • test/core/conftest.py now defines _iter_graphs() and _iter_tests() directly instead of importing them from run_tests.py.

  • test/core/test_core_pytest.py now defines _run_flow() directly in place of run_test(). It supports cli, api, and scheduler executors.

  • This also fixes two existing bugs:

    • the api executor now catches RuntimeError from Runner.run() and converts it into a non-zero return code instead of surfacing an unhandled exception
    • the resume path now returns early when the resume subprocess fails, avoiding a follow-on FileNotFoundError from open("run-id")

R3: Convert test flows to standard pytest tests

The core test suite now behaves like normal pytest code instead of relying on a subprocess-heavy custom harness.

  • MetaflowTest has been renamed to FlowDefinition across all 64 test classes and in metaflow_test/__init__.py.
  • The Test suffix is removed because these classes are flow templates combined with graph topologies by FlowFormatter, not pytest test cases.
  • A MetaflowTest = FlowDefinition alias is kept for external compatibility.
  • Verification now runs in-process instead of in a second subprocess. _run_flow() dynamically imports the generated test_flow.py, instantiates the flow class, and calls formatter.test.check_results(flow, checker) directly.
  • Check failures now surface as normal AssertionErrors with full pytest tracebacks instead of opaque subprocess exit codes.
  • FlowFormatter._check_lines() and check_code are removed.
  • MetaflowCheck no longer depends on sys.argv: run_id and cli_options are now explicit constructor parameters.
  • new_checker now accepts either a checker class or a checker class name.
  • 243 assert_equals(a, b) calls across 54 files are replaced with plain assert a == b, enabling pytest assertion rewriting and better failure output.
  • 10 uses of assert_exception(lambda: f(), E) are replaced with pytest.raises(E) in tag_mutation, merge_artifacts, merge_artifacts_include, and metadata_check.
  • assert_equals_metadata is removed and replaced with inline assertions in resume_end_step.py.

R4: Simplify test utilities

Test helpers are reduced to standard pytest patterns wherever possible.

  • assert_equals, assert_exception, and assert_equals_metadata are removed from metaflow_test/__init__.py.
  • ExpectationFailed, AssertArtifactFailed, AssertLogFailed, and AssertCardFailed now subclass AssertionError, so pytest reports them natively.
  • assert_artifact, assert_log, and assert_card are rewritten to use plain assert internally rather than manually raising custom exceptions.
  • artifact(step, name) is added to both CliCheck and MetadataCheck, returning {task_id: value} so tests can make direct assertions such as assert checker.artifact(step, "data") == {"task1": "abc"}.
  • test/core/pytest.ini is added to centralize pytest configuration, including norecursedirs, timeout = 1800, addopts = -v --tb=short, and the seven backend markers. Tox command lines now only need to pass the marker flag and parallelism settings.

R5: tox is now the orchestration layer

Core test environments can now be run directly with tox, without any custom orchestration layer:

tox -c test/core/tox.ini -e core-local   # local filesystem
tox -c test/core/tox.ini -e core-gcs     # GCS via fake-gcs-server
tox -c test/core/tox.ini -e core-azure   # Azure Blob via Azurite
tox -c test/core/tox.ini -e core-batch   # AWS Batch via localbatch + MinIO

GCS emulator support

This PR also adds first-class support for running against a local GCS emulator.

  • metaflow/plugins/gcp/gs_storage_client_factory.py now creates an anonymous storage.Client() when STORAGE_EMULATOR_HOST is set, instead of calling google.auth.default(). This allows flows to run against fake-gcs-server without real GCP credentials.
  • devtools/ now includes fake-gcs-server as a first-class service, with Kubernetes deployment and service definitions, bucket-init job, secret, a dedicated Tilt file, and integration into the main Tiltfile and pick_services.sh.
  • The emulator can be started with SERVICES_OVERRIDE=fake-gcs-server make up.
  • core-gcs and core-azure now set METAFLOW_DEFAULT_DATASTORE=gs/azure along with the corresponding sysroot and endpoint variables, so flows actually exercise cloud storage code paths against local emulators. This matches the existing core-batch pattern with MinIO.

Test Plan

tox -c test/core/tox.ini -e core-local — 470 tests collected and passing
tox -c test/core/tox.ini -e core-gcs — requires fake-gcs-server at localhost:4443 (SERVICES_OVERRIDE=fake-gcs-server make up)
tox -c test/core/tox.ini -e core-azure — requires Azurite at localhost:10000
tox -c test/core/tox.ini -e core-batch, core-k8s, core-argo, core-sfn — require the full devtools stack

Runtime:

Commands to run:

# paste exact commands

Where evidence shows up:

Before (error / log snippet)
paste here
After (evidence that fix works)
paste here

Root Cause

Why This Fix Is Correct

Failure Modes Considered

Tests

  • Unit tests added/updated
  • Reproduction script provided (required for Core Runtime)
  • CI passes
  • If tests are impractical: explain why below and provide manual evidence above

Non-Goals

AI Tool Usage

  • No AI tools were used in this contribution
  • [ X ] AI tools were used (describe below)
    • Claude Code

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

This is a large, well-executed refactoring that replaces a 643-line custom test runner with standard pytest/tox tooling. All P0/P1 issues from previous review rounds are correctly addressed: import pytest is now present in tag_mutation.py and emitted by formatter.py into generated flows, gs_storage_client_factory.py uses AnonymousCredentials explicitly, and core-azure/core-gcs tox envs correctly set --datastore=azure/gs.

Confidence Score: 5/5

Safe to merge; all P0/P1 issues from previous review rounds are addressed and remaining findings are P2 style suggestions only.

No new P0 or P1 issues found. The three comments are all P2: tempdir cleanup on failure (usability for debugging), sys.path import ordering (bare-pytest ergonomics), and tox deps duplication. These do not affect correctness or CI reliability.

test/core/conftest.py (sys.path ordering before local imports), test/core/test_core_pytest.py (tempdir always deleted including on failure)

Important Files Changed

Filename Overview
test/core/conftest.py Rewrites pytest_generate_tests to parametrize from env vars; sys.path setup is after the top-level metaflow_test import, which can break bare pytest invocations from the repo root.
test/core/test_core_pytest.py Main test runner rewrite; correctly fixes api executor RuntimeError and resume early-return; tempdir is always deleted in the finally block including on failure, losing the generated test_flow.py.
test/core/metaflow_test/formatter.py Now emits import pytest in generated flow code, fixing NameError for merge_artifacts / merge_artifacts_include step bodies.
test/core/metaflow_test/init.py Renames MetaflowTest to FlowDefinition with backward-compat alias; removes assert_equals/assert_exception helpers; exception subclasses now inherit AssertionError.
test/core/tox.ini New tox config; core-azure and core-gcs now correctly set cloud datastores and credentials; core-azure duplicates deps section unnecessarily.
metaflow/plugins/gcp/gs_storage_client_factory.py Correctly uses AnonymousCredentials when STORAGE_EMULATOR_HOST is set, avoiding DefaultCredentialsError in CI without ADC.
test/core/metaflow_test/cli_check.py Converts raise-based assertions to plain assert statements; adds artifact() helper returning {task_id: value}; logic is equivalent and correct.
test/core/metaflow_test/metadata_check.py Same assertion rewrite as cli_check; now imports pytest at top level for pytest.raises in check_setup_credentials.
test/core/tests/tag_mutation.py Now correctly imports pytest at the top; all pytest.raises usages in check_results are valid.
test/core/tests/merge_artifacts.py pytest.raises replaces assert_exception in step bodies; import pytest is present; formatter now emits import pytest so generated flow code is correct.

Reviews (10): Last reviewed commit: "Passing AnonymousCredentials()" | Re-trigger Greptile

Comment thread test/core/conftest.py Outdated
Comment thread test/core/conftest.py Outdated
Comment thread test/core/conftest.py Outdated
Comment thread test/core/test_core_pytest.py Outdated
Comment thread test/core/test_core_pytest.py Outdated
Comment thread test/core/tox.ini
Comment thread test/core/tests/tag_mutation.py
Comment thread test/core/metaflow_test/formatter.py
Comment thread test/core/conftest.py Outdated
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@Tingting-Chang Tingting-Chang changed the title [WIP] Simplify OSS Metaflow core tests to be more Pythonic / pytest-friendly / tox-friendly [WIP] AIPMDM-888: Simplify OSS Metaflow core tests to be more Pythonic / pytest-friendly / tox-friendly Apr 28, 2026
Comment thread metaflow/plugins/gcp/gs_storage_client_factory.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant