-
Notifications
You must be signed in to change notification settings - Fork 381
feat(dashboard): enhance dashboard UI and fix Ray runner state reporting #6008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1. **Frontend Enhancements**:
* **All Queries Page**: Updated table header to use white background (bg-white) with black text and grey separators, improving readability.
* **Query Detail Page**:
* Added Entrypoint (command line) and Engine (Swordfish/Flotilla) fields to the metadata section.
* Added a direct link to the **Ray Dashboard** for Ray-based queries.
* Improved metadata visibility by using high-contrast text (text-zinc-100).
* **Progress Table**: Refined table headers with dark theme (bg-zinc-800), white text, and clear column separators. Added hover effects for better interactivity.
* **Engine Naming**: Standardized engine display names (Native -> Swordfish, Ray -> Flotilla).
2. **Backend Fixes & Improvements**:
* **State Management**: Fixed an issue where failed Ray queries were not correctly reporting their terminal state to the dashboard (causing 400 errors). Now allows transitions to Failed state from active states.
* **Metadata Propagation**: Updated RayRunner to capture and transmit entrypoint and ray_dashboard_url to the dashboard backend.
* **Python API**: Exposed repr_json on DistributedPhysicalPlan in __init__.pyi to fix mypy errors and support plan visualization.
3. **Code Cleanup**:
* Removed unused imports and debug logging.
* Standardized sys and os imports in ray_runner.py.
* Fixed mypy type definition errors in daft/__init__.pyi related to context notification methods.
Greptile OverviewGreptile SummaryThis PR enhances the Daft dashboard with improved UI and fixes critical state reporting issues for Ray queries. Key ChangesBackend Improvements:
Frontend Enhancements:
Code Quality:
Implementation NotesThe core fix addresses a state machine issue where Ray queries that failed couldn't transition to the Failed state, causing backend 400 errors. The solution makes The Ray dashboard URL extraction uses Minor IssuesAll findings are non-blocking style/documentation issues (see inline comments for details). Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant Runner as Runner (Native/Ray)
participant Context as DaftContext
participant Subscriber as DashboardSubscriber
participant Backend as Dashboard Backend
participant Frontend as Dashboard Frontend
User->>Runner: Execute query
Runner->>Context: _notify_query_start(query_id, metadata)
Note over Runner: metadata includes runner, entrypoint, ray_dashboard_url
Context->>Subscriber: on_query_start(query_id, metadata)
Subscriber->>Backend: POST /query/{id}/start
Backend->>Frontend: WebSocket update
Runner->>Context: _notify_optimization_start(query_id)
Context->>Subscriber: on_optimization_start(query_id)
Subscriber->>Backend: POST /query/{id}/plan/start
Backend->>Frontend: WebSocket update (status: Optimizing)
Runner->>Runner: Optimize plan
Runner->>Context: _notify_optimization_end(query_id, optimized_plan)
Context->>Subscriber: on_optimization_end(query_id, plan)
Subscriber->>Backend: POST /query/{id}/plan/end
Backend->>Frontend: WebSocket update (status: Setup)
Runner->>Context: _notify_exec_start(query_id, physical_plan)
Context->>Subscriber: on_exec_start(query_id, physical_plan)
Subscriber->>Backend: POST /query/{id}/exec/start
Backend->>Frontend: WebSocket update (status: Executing)
loop For each result
Runner->>Context: _notify_exec_emit_stats(query_id, node_id, stats)
Context->>Subscriber: on_exec_emit_stats(query_id, stats)
Subscriber->>Backend: POST /query/{id}/exec/op/{op_id}/emit_stats
Backend->>Frontend: WebSocket update (progress data)
end
alt Success
Runner->>Context: _notify_query_end(query_id, Finished)
Context->>Subscriber: on_query_end(query_id, result)
Subscriber->>Backend: POST /query/{id}/end (Finished)
Backend->>Frontend: WebSocket update (status: Finished)
else Failure
Runner->>Context: _notify_query_end(query_id, Failed)
Context->>Subscriber: on_query_end(query_id, result)
Subscriber->>Backend: POST /query/{id}/end (Failed)
Note over Backend: Accepts Failed from Executing state
Backend->>Frontend: WebSocket update (status: Failed)
end
Frontend->>User: Display query status and Ray dashboard link
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 5 comments
|
|
||
| # Optimize the logical plan. | ||
| ctx._notify_query_start(query_id, PyQueryMetadata(output_schema._schema, builder.repr_json())) | ||
| import sys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2] Import statements should be at the top of the file. Move import sys to the imports section at the beginning of the file (around line 3-4) per the project's import style guidelines.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: daft/runners/native_runner.py
Line: 99:99
Comment:
[P2] Import statements should be at the top of the file. Move `import sys` to the imports section at the beginning of the file (around line 3-4) per the project's import style guidelines.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| # Try to emit stats for operator 0? | ||
| # If NativeExecutor doesn't start operator 0, this might fail or be ignored. | ||
| # Let's comment it out for now to see baseline behavior. | ||
| ctx._notify_exec_emit_stats(query_id, 0, {"rows in": total_rows, "rows out": total_rows}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P3] The comment says "Let's comment it out for now to see baseline behavior" but the code on line 129 is actually executing. Either remove this misleading comment or comment out the code if it was intended to be disabled.
Prompt To Fix With AI
This is a comment left during a code review.
Path: daft/runners/native_runner.py
Line: 126:129
Comment:
[P3] The comment says "Let's comment it out for now to see baseline behavior" but the code on line 129 is actually executing. Either remove this misleading comment or comment out the code if it was intended to be disabled.
How can I resolve this? If you propose a fix, please make it concise.| tracing::info!("Received exec_start for query {}", query_id); | ||
| let query_info = state.queries.get_mut(&query_id); | ||
| let Some(mut query_info) = query_info else { | ||
| tracing::error!("Query {} not found in exec_start", query_id); | ||
| return StatusCode::BAD_REQUEST; | ||
| }; | ||
|
|
||
| // Debug state | ||
| tracing::info!("Query {} state: {:?}", query_id, query_info.state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P3] Debug logging statements appear to have been left in from development. The comment "Debug state" at line 181 and tracing::info! calls at lines 174 and 182 look like temporary debugging code. Consider removing or converting to tracing::debug! if this level of verbosity isn't needed in production.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daft-dashboard/src/engine.rs
Line: 174:182
Comment:
[P3] Debug logging statements appear to have been left in from development. The comment "Debug state" at line 181 and `tracing::info!` calls at lines 174 and 182 look like temporary debugging code. Consider removing or converting to `tracing::debug!` if this level of verbosity isn't needed in production.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| // @ts-ignore | ||
| columnHelper.accessor("ray_dashboard_url", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2] Using @ts-ignore suppresses TypeScript type checking instead of fixing the underlying type issue. The ray_dashboard_url property is properly defined as optional in the QuerySummary type. Consider properly typing the accessor or using @ts-expect-error with a specific explanation if this is a known limitation of the column helper library.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/daft-dashboard/frontend/src/app/queries/page.tsx
Line: 123:124
Comment:
[P2] Using `@ts-ignore` suppresses TypeScript type checking instead of fixing the underlying type issue. The `ray_dashboard_url` property is properly defined as optional in the `QuerySummary` type. Consider properly typing the accessor or using `@ts-expect-error` with a specific explanation if this is a known limitation of the column helper library.
How can I resolve this? If you propose a fix, please make it concise.| import daft.datasets as datasets | ||
| import daft.functions as functions | ||
| import daft.gravitino as gravitino | ||
| # import daft.gravitino as gravitino |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2] The gravitino import is commented out but this change isn't mentioned in the PR description. This appears to be an unrelated change. If gravitino support is being removed or temporarily disabled, it should be documented in the PR description or done in a separate commit.
Prompt To Fix With AI
This is a comment left during a code review.
Path: daft/__init__.py
Line: 155:155
Comment:
[P2] The gravitino import is commented out but this change isn't mentioned in the PR description. This appears to be an unrelated change. If gravitino support is being removed or temporarily disabled, it should be documented in the PR description or done in a separate commit.
How can I resolve this? If you propose a fix, please make it concise.
Frontend Enhancements:
Backend Fixes & Improvements:
Code Cleanup:
Changes Made
Related Issues