Skip to content

Commit 21954b3

Browse files
thedavekwonmeta-codesync[bot]
authored andcommitted
remove direct usage of start_telemetry (#4048)
Summary: Pull Request resolved: #4048 Use job-level telemetry in tests, examples, and docs in preparation of per-host telemetry actor RFC; default dashboard port to `0`. Reviewed By: shayne-fletcher Differential Revision: D106130989 fbshipit-source-id: 71bda3f7a907ec50e93c9581007af4ae456dc6cc
1 parent 6b57bef commit 21954b3

7 files changed

Lines changed: 189 additions & 167 deletions

File tree

docs/source/monarch-dashboard.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ full DAG visualization.
99
> **Note** — The Monarch Dashboard is in early development and may change
1010
> significantly between releases.
1111
12-
The dashboard is included in the `torchmonarch` PyPI package. When you call
13-
`start_telemetry(include_dashboard=True)`, it starts a local web server that
12+
The dashboard is included in the `torchmonarch` PyPI package. When a job enables
13+
`TelemetryConfig(include_dashboard=True)`, Monarch starts a local web server that
1414
serves the dashboard UI.
1515

1616
## Quick Start

examples/distributed_telemetry.py

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
1111
This example demonstrates querying real tracing data collected from actors:
1212
13-
1. Starts telemetry
13+
1. Enables job-level telemetry
1414
2. Spawns actors that do work (generating real tracing events)
1515
3. Queries the spans, span_events, events, and actors tables
1616
@@ -36,7 +36,6 @@
3636

3737
import pyarrow as pa
3838
from monarch.actor import Actor, endpoint
39-
from monarch.distributed_telemetry.actor import start_telemetry
4039
from monarch.job import ProcessJob, TelemetryConfig
4140

4241

@@ -470,10 +469,7 @@ def run_workload(job, summary=False, interactive=False):
470469
"""Run the full telemetry demo: spawn actors, run work, query, and shut down.
471470
472471
Args:
473-
job: JobTrait whose state has a "workers" HostMesh.
474-
If the job was created with ``telemetry=TelemetryConfig()``, the
475-
query engine is available via ``state.query_engine`` and
476-
``start_telemetry()`` does not need to be called separately.
472+
job: JobTrait whose state has a "workers" HostMesh and telemetry enabled.
477473
summary: If True, print summary output instead of full tables.
478474
interactive: If True, pause after setup so the dashboard can be browsed.
479475
"""
@@ -482,11 +478,9 @@ def run_workload(job, summary=False, interactive=False):
482478

483479
state = job.state(cached_path=None)
484480

485-
# Use engine from JobState if available (telemetry configured on job),
486-
# otherwise fall back to manual start_telemetry() for backward compat.
487481
engine = state.query_engine
488482
if engine is None:
489-
engine, _, _scanner = start_telemetry()
483+
raise RuntimeError("run_workload requires job.enable_telemetry(...)")
490484

491485
hosts = state.hosts
492486

hyperactor_mesh/test/mesh_admin_integration/telemetry.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
//!
1212
//! These routes proxy to the Monarch dashboard and require
1313
//! `telemetry_url` to be configured. The Python dining_philosophers
14-
//! binary is launched with `--dashboard` so that `start_telemetry`
14+
//! binary is launched with `--dashboard` so that job-level telemetry
1515
//! starts the dashboard and passes `telemetry_url` to `_spawn_admin`.
1616
1717
use std::time::Duration;

python/monarch/_src/job/job.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
from abc import ABC, abstractmethod
2121
from dataclasses import dataclass
2222
from pathlib import Path
23-
from typing import Any, Dict, List, Literal, NamedTuple, Optional, Sequence
23+
from typing import Any, Dict, List, Literal, NamedTuple, Optional, Self, Sequence
2424

2525
from monarch._rust_bindings.monarch_hyperactor.host_mesh import PyMeshAdminRef
2626
from monarch._src.actor.bootstrap import attach_to_workers
@@ -522,7 +522,7 @@ def _wrap_state(self, job_state: JobState) -> JobState:
522522

523523
def enable_telemetry(
524524
self, config: "Optional[TelemetryConfig]" = None, **kwargs
525-
) -> "JobTrait":
525+
) -> Self:
526526
"""Configure automatic telemetry startup on the next :meth:`state` call.
527527
528528
Args:
@@ -537,7 +537,7 @@ def enable_telemetry(
537537

538538
def enable_admin(
539539
self, config: "Optional[MeshAdminConfig]" = None, **kwargs
540-
) -> "JobTrait":
540+
) -> Self:
541541
"""Configure automatic mesh admin agent startup on the next :meth:`state` call.
542542
543543
Args:

python/monarch/distributed_telemetry/__init__.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,11 @@
1515
3. QueryEngine (Rust): DataFusion query execution
1616
1717
Usage:
18-
from monarch.distributed_telemetry.actor import start_telemetry
18+
from monarch.job import ProcessJob, TelemetryConfig
1919
20-
engine, telemetry_url, scanner = start_telemetry()
20+
state = ProcessJob({"hosts": 1}).enable_telemetry(TelemetryConfig()).state()
21+
engine = state.query_engine
22+
assert engine is not None
2123
# ... spawn procs, they're automatically tracked ...
2224
result = engine.query("SELECT * FROM metrics")
2325
"""

python/monarch/monarch_dashboard/server/query_engine_adapter.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"""Production adapter: wraps the Monarch DataFusion QueryEngine.
88
99
Unlike the SQLite-based db.py (local dev/testing), this connects directly
10-
to the live telemetry engine started by start_telemetry(). The QueryEngine
10+
to the live telemetry engine attached to a job state. The QueryEngine
1111
uses DataFusion as its SQL planner/executor and returns pyarrow Tables.
1212
"""
1313

@@ -25,8 +25,10 @@ class QueryEngineAdapter(DBAdapter):
2525
2626
Usage::
2727
28-
from monarch.distributed_telemetry.actor import start_telemetry
29-
engine, _, _scanner = start_telemetry()
28+
from monarch.job import ProcessJob, TelemetryConfig
29+
state = ProcessJob({"hosts": 1}).enable_telemetry(TelemetryConfig()).state()
30+
engine = state.query_engine
31+
assert engine is not None
3032
adapter = QueryEngineAdapter(engine)
3133
rows = adapter.query("SELECT * FROM actors LIMIT 10")
3234
"""

0 commit comments

Comments
 (0)