feat: check_models external readiness check#712
Conversation
Greptile SummaryThis PR extracts the model and MCP tool health-check logic from
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/readiness.py | New module; correctly lifts model and MCP tool health-check logic out of DatasetBuilder, parameterized on ClientConcurrencyMode. Logic is functionally equivalent to the removed methods. |
| packages/data-designer-engine/src/data_designer/engine/flags.py | New module centralising DATA_DESIGNER_ASYNC_ENGINE flag; clean single source of truth for tests to monkeypatch. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py | Removes _run_model_health_check_if_needed and _run_mcp_tool_check_if_needed; replaces both call sites with run_readiness_check. _use_async is now resolved before the health check, making client_concurrency_mode consistent with actual execution mode. |
| packages/data-designer/src/data_designer/interface/data_designer.py | Adds check_models method; delegates to run_readiness_check with the same resolved client_concurrency_mode used by the workload path. |
| packages/data-designer/src/data_designer/cli/controllers/generation_controller.py | Adds run_check_models; mirrors run_validate pattern with typed-error class name surfaced in the error message for easier user diagnosis. |
| packages/data-designer/src/data_designer/cli/commands/check_models.py | New CLI command; thin delegation to GenerationController.run_check_models, mirrors validate.py structure. |
| scripts/health_checks.py | Simplified to use DataDesigner.check_models; loses the 2-attempt retry for chat completion and explicit embedding-vector size validation present in the old direct ModelFacade approach. |
| packages/data-designer-engine/tests/engine/test_readiness.py | Comprehensive new test file covering model probe, MCP probe, ordering, async dispatch (including timeout/cancel), and all column-type coverage paths. |
Sequence Diagram
sequenceDiagram
participant User
participant CLI as check-models CLI
participant Ctrl as GenerationController
participant DD as DataDesigner
participant RP as ResourceProvider
participant R as readiness.py
participant MR as ModelRegistry
participant MCP as MCPRegistry
User->>CLI: dd check-models config.yaml
CLI->>Ctrl: run_check_models(config_source)
Ctrl->>Ctrl: _load_config(config_source)
Ctrl->>DD: DataDesigner()
Ctrl->>DD: check_models(config_builder)
DD->>RP: _create_resource_provider("check-models", config_builder)
DD->>DD: _resolve_client_concurrency_mode(config_builder)
DD->>R: run_readiness_check(columns, resource_provider, client_concurrency_mode)
R->>MR: run_health_check / arun_health_check(model_aliases)
MR-->>R: OK / raises ModelError
R->>MCP: run_health_check(tool_aliases)
MCP-->>R: OK / raises RuntimeError
R-->>DD: None (success)
DD-->>Ctrl: None
Ctrl->>User: All models and tools responded successfully
Note over DD,R: Same run_readiness_check path used by DatasetBuilder.build() / build_preview()
Reviews (4): Last reviewed commit: "align readiness with client mode" | Re-trigger Greptile
6ef34db to
88f1314
Compare
| if not model_aliases: | ||
| return | ||
|
|
||
| if flags.DATA_DESIGNER_ASYNC_ENGINE: |
There was a problem hiding this comment.
small edge case: check_models() can build sync-mode clients when the config forces sync fallback, like allow_resize=True, but readiness still branches on the raw async env flag and calls arun_health_check(). I know allow_resize / sync are probably on the way out, so maybe this is just a known limitation, but it might be worth either matching the resolved client mode here or adding a regression/explicit guard so users do not hit an async/sync client internals error.
There was a problem hiding this comment.
Interesting, I overlooked this, thanks. Addressed in c9c118f. This should be fairly straightforward to unwind if we fully drop the sync engine.
| fake_loop = Mock() | ||
|
|
||
| with ( | ||
| patch("data_designer.engine.readiness.ensure_async_engine_loop", return_value=fake_loop, create=True), |
There was a problem hiding this comment.
suggestion: this patch target does not look like it is used. _run_model_health_check() imports ensure_async_engine_loop inside the function from data_designer.engine.dataset_builders.utils.async_concurrency, so patching data_designer.engine.readiness.ensure_async_engine_loop will not intercept it. Could patch the source path instead and drop create=True; same thing applies to the timeout test below.
|
I think this new readiness path could eventually replace the implementation in |
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
c4715f9 to
c9c118f
Compare
|
@andreatgretel I updated the health_checks script to use this new method in b3a9d88, LMK what you think! |
📋 Summary
Introduces a new
check_modelsCLI command andDataDesignerinterface method for checking external readiness of models and tools without triggering a full workload (preview or create). This is the "external deps" analogue to the existingvalidatefunctionality (internal coherence).🔗 Related Issue
N/A, direct to PR
🔄 Changes
DatasetBuilderto a standalonereadiness.pyengine module. Used by both the builder and the end user-facing interfacesflags.pyengine module where the async engine flag is centralized. This cleans up some duplication of the env var magic string / constant that was floating around in a few places.🧪 Testing
make testpasses✅ Checklist